Dataflow analysis - you will use llvm to implement two


Dataflow Analysis

Objective

This lab will familiarize you with writing static program analyses using the LLVM compiler infrastructure. LLVM is a collection of compiler and analysis toolchain utilities widely used in the software analysis community. You will use LLVM to implement two intra-procedural dataflow analyses, one forward (reaching definitions analysis) and one backward (liveness analysis). In LLVM, these are referred to as passes over the code.

Lab Setup

1. Download and extract the lab code found on Canvas in the dataflow.zip archive.

2. Navigate to dataflow/build and run the following commands:

cmake ..
make clean
make

You should now see libDataflowPass.so under dataflow/build/Dataflow.

3. Go to the dataflow/example directory and generate LLVM bitcode from the programs we will analyze with the following commands:

clang -emit-llvm ArrayDemo.c -c -o ArrayDemo.bc
clang -emit-llvm Greatest.c -c -o Greatest.bc

4. Run a Dataflow pass using the command below to ensure everything works as expected for the test program ArrayDemo.c. This pass demonstrates the API by printing the definitions, uses, predecessors, and successors of each instruction.

opt -load ../build/Dataflow/libDataflowPass.so -Printer < ArrayDemo.bc > /dev/null

In addition to testing your setup, the printer pass (in Printer.cpp) is a very useful reference for implementing your own passes.

Lab Instructions

Complete the doAnalysis method in the ReachDefAnalysis.cpp and LivenessAnalysis.cpp (located in dataflow/Dataflow/) skeleton files to implement the two analyses. Do not write your analysis code outside of these files, as these files are the only ones you will submit. You may use the C++ Standard Template Library (STL) when implementing your analyses, but you may not use other third party libraries.

Your code will need to iterate over the program points in the input program and store the computed dataflow facts in DataflowAnalysis::inMap and DataflowAnalysis::outMap. Both analyses inherit from the base class DataflowAnalysis, which you can find in the header file DataflowAnalysis.h located in the directory dataflow/Dataflow/. Besides including useful classes such as SetVector and ValueMap, DataflowAnalysis.h also defines useful utility functions such as getPredecessors, getSuccessors, and isDef.

LLVM passes are performed an intermediate representation (LLVM bitcode) generated from the program source code. LLVM bitcode is generated in Static Single Assignment (SSA) form, a common simplification used by compiler infrastructure. In SSA form, each variable is assigned exactly once, and every variable is defined before it is used. Variables are represented directly by the instruction defining it. In fact, the Instruction class is a subtype of the Value class. You can check whether an instruction defines a variable by checking getType()->isVoidTy(). An entry into the inMap or outMap is the instruction as opposed to a variable. The lectures speak to variables being stored (x or y) but with the SSA representation being used, the instruction itself is a proxy for the variable. That is why the inMap and outMap all show the instructions in the printer output.

Since each variable is uniquely defined, variables will never be redefined in SSA form. This will affect the generation of KILL sets in your implementation of reaching definitions analysis, specifically, they will always be empty.

After completing the two doAnalysis methods, rebuild the analyses using the commands from setup step 2, and then rerun the analyses using following commands to print the results of the analyses on the ArrayDemo program:

opt -load ../build/Dataflow/libDataflowPass.so -ReachDef < ArrayDemo.bc > /dev/null opt -load ../build/Dataflow/libDataflowPass.so -Liveness < ArrayDemo.bc > /dev/null

If your implementation is correct, your output will match the example output in ArrayDemo_ReachDef and ArrayDemo_Liveness found in dataflow/example/. The order of elements in the IN and OUT sets does not matter, but the number of elements and the values should match exactly.

We have also included another program, Greatest.c, and it's expected outputs for testing your implementation. You can use commands similar to those above to analyze this program. We also encourage students to develop their own test cases / corresponding output and share them on Piazza, although we will not be validating them for correctness.

Additionally, we have provided images of the control flow graphs (CFGs) for both programs, named array_demo_graph.png and greatest_graph.png. Each instruction is represented by a node, and edges represent an instruction's successor instruction(s). These graphs are useful for reasoning about the correctness of your IN and OUT sets for the sample program analyses. We recommend that you take a moment to review the graph and ensure you understand the structure of each program. Pay close attention to the instructions that have multiple successor and/or predecessor instructions.

Helpful LLVM API Information

ValueMap has a similar interface to std::map. For example, you can access or insert an element using the [] operator:

ValueMap vm; Instruction* I = /*...*/; vm[I] = 5; // inserts to the map

LLVM will generate all of the necessary objects for your analyses according to its object model (see DataflowAnalysis.cpp). You will not need to create new elements to insert into DataflowAnalysis::inMap or DataflowAnalysis::outMap, your code should add the pointer to the object to the ValueMap.

SetVector has a similar interface to std::vector, except that it does not permit inserting duplicate elements. The == operator returns false for two SetVector objects containing the same elements in different orders.

Also, functions in the library from the STL work with SetVector. For example:
SetVector sv; for( int i = 0; i < 5; i++ )
sv.insert(i); sv.insert(0); // has no effect if (std::all_of(sv.begin(), sv.end(), isPositive))
// all_of is from printf("All numbers in sv are positive!");

assuming isPositive is defined elsewhere as:
             bool isPositive(int i) {return i > 0;}

Items to Submit
Submit the following files in a single compressed file (.zip format) named lab3.zip. For full credit, there must not be any subfolders or extra files contained within your zip file.
1. LivenessAnalysis.cpp
2. ReachDefAnalysis.cpp

Attachment:- dataflow.zip

Request for Solution File

Ask an Expert for Answer!!
Dissertation: Dataflow analysis - you will use llvm to implement two
Reference No:- TGS02801362

Expected delivery within 24 Hours