Com4521com6521 parallel computing with gpus the aim of the


Introduction-

The aim of the assignment is to test your understanding and technical ability of implementing efficient code on the GPU with CUDA. You will be expected to benchmark and optimise the implementation of a simple rule based simulation. You will start by implementing a serial CPU version, you will then parallelise this version for multi-core processors using OpenMP, before finally implementing the same simulation in CUDA. The emphasis of the assignment is on how you have optimised and progressively improved your code in order to implement the simulation efficiently. In order to demonstrate this you are expected to document the processes of benchmarking and improving your code by producing a written report. The written report should show the performance improvements of your code and demonstrate that you understand what limiting factors affect the performance at each stage of your implementation. Handing in just a piece of code with excellent performance will not score highly in the assessment unless you have demonstrated in understanding in the written report of how you have progressively refined your implementation to achieve the final solution.

The Task-

Conway's Game of Life1 is a simple cellular automaton which describes a rule based game in which a player describes only the initial system state. A cellular automaton is a discrete model which consists of a population of regularly spaced cells each with a number of states. By communicating with neighbouring cells, a new generation of the population (at t + 1)   can be generated by applying a set of rules which use   the cell's current state and the cells neighbouring states to determine the next state of the current cell. With respect to the Game of Life, the specific rules of the game are very simple.  A cell within the population is either in an alive or dead state. At each time step, a cell considers its Moore    neighborhood   of immediately adjacent neighbors, including diagonal neighbors,   of   which there are 8 in total (see Figure 1).

1144_Figure.png

For any live cells the following rules are applied in determining the state of the cell at ?? + 1;

1) If there are fewer than two live neighbours; the cell dies of loneliness.

2) If there are two or three live neighbours; the cell remains alive.

3) If there are four or more live neighbours; the cell will die due to overcrowding.

For any dead cells the following rules are applied in determining the state of the cell at t+1;

1) If there are exactly three live neighbours; the dead cell becomes alive due to reproduction.

2) For any other configuration of neighbours the cell remains dead.

The implementation of the task is split into three parts. The first part of the assignment requires implementing the game of life simulation. The second part of the assignment considers how to count a number of patterns observed during simulation. The third part allows you to pick a specific extension to implement within the program.

Part 1 Requirements:

You should start from the provided source code and complete the TODO sections. Your program should accept the arguments described by the existing print_help() function. The file format for Game of Life files (both input and output) should use plain text with a line width and number of lines matching the program arguments W and H respectively. Each line should be terminated with a carriage return ('\n'). A single character should be used to encode the state of each grid cell. If the cell is dead the character should have a space value (i.e. character ' ' with ASCII code #32). If the cell is alive the character should have a value of '#' (i.e. ASCII code #35). Using this plain text format you can visually check the state of any Game of Life file.

You should implement a serial C version of the code and make this as efficient as you can. Once implemented your serial version should act as a baseline to measure potential performance speedup of your parallel versions. Note: It is not permitted to use data structures such as trees to spatially reduce the grid. You code must explicitly store a 2D set of state data for each cell and progress the simulation by considering the Moore neighbourhood.

You should implement a multi core parallel version of the same simulation which uses OpenMP.

Finally, you should implement a CUDA version of the simulation. You should consider how using different approaches for memory caching affect performance over a range of problem sizes. You should also consider how boundary conditions (overlapping regions between thread blocks) are handled and give an overview of any approaches you have applied to improve boundary conditions performance.

For all versions of the implementation the environment should have wrapping bounds. This means that the environment should be 2D but form a 3D torus where cells on the extreme right side of the environment are neighbours to cells on the extreme left, and cells on the extreme top side of the environment are neighbours to cells on the extreme bottom (Figure 2).

1250_Figure1.png

Part 2 Requirements:

It is required that you update your three implementations (serial, CPU and GPU) to be able to provide analysis of the simulation by recognising common patterns. For example, there are a number of patterns which once created in the Game of Life will remain static unless their immediate neighbours are modified. These static patterns are often referred to as a still life patterns. For the purposes of the assignment it is required to know the total number of still life Cross patterns (Figure 3) for any given iteration.

1175_Figure2.png

The Game of Life can also exhibit oscillating patterns. These oscillating patterns are classed by the period (number of iterations) which is required for them to repeat. For the purposes of the assignment it is required to know the total number of Blinker patterns (period 2) which are exhibited for any given iteration. See Figure 4.

40_Figure3.png

A Glider is a special case of an oscillating pattern which migrates across the screen as it oscillates.   For the purposes of the assignment it is required to know the total number of Gliders which are exhibited for any given iteration. Figure 5 shows a the 4 patterns that are exhibited by a Gliders over 4 iterations when travelling in a south east direction. You are required to calculate the number of gliders heading in any direction. In total there are 16 unique patterns which can be obtained by rotating each of the patterns in Figure 5 by 90, 180 and 270 degrees.

1778_Figure4.png

Note: that for all pattern recognition tasks (Cross, Blinkers and Gliders), it is important that the pattern and surrounding 5x5 neighbours match those shown in the figures exactly. For example when matching a pattern both the live and dead cells must represent the 5x5 patterns exactly. I.e. there must be 25 exactly matching cells. Figure 6 shows some examples of patterns which should  not be matched.

2336_Figure5.png

In order to implement counting of patterns you will be required to implement a form of 2D parallel reduction. You should consider different approaches (e.g. recursion, shared memory and warp shuffling) for implementing this reduction and describe how the performance of each suits the technique you have implemented.

Part 3:

For the final part of the assignment you have the freedom to implement ONE of the following more advanced changes to the program.

  • Real time visualisation using CUDA OpenGL interoperability.

 • Modify the simulation to recognise arbitrary patterns of any size and any period. Patterns should be provided as input to the simulation. You will need to create an input file to describe all possible patterns and their names.

  • Extend the simulation to three dimensions. You will need to modify the rules to ensure that the simulation does not die out or demonstrate explosive growth.

You should allow your extension to the program to execute ONLY when using some additional command line option. You should update the print_help() function to describe how this works. As with the previous parts of the assignment you should benchmark your code to demonstrate the performance implications of your implementation.

Attachment:- Assignment.rar

Request for Solution File

Ask an Expert for Answer!!
Programming Languages: Com4521com6521 parallel computing with gpus the aim of the
Reference No:- TGS01381602

Expected delivery within 24 Hours