Improve the performance of the kernel developed in the


Parallel Computer Architecture Programming Assignment: Equation Solver

Consider the Gauss-Seidel equation solver discussed within the lecture notes on how to write parallel programs-see the file called parallelization process.pdf on BBLearn. You will find the source code for the reference implementation in the zip file on BBLearn.

Recall that the order in which the grid points are updated in the sequential algorithm is not funda- mental to the Gauss-Seidel solution method; it is simply one possible ordering that is convenient to program sequentially. Since the Gauss-Seidel method is not an exact solution method but rather iterates until convergence, we can update the grid points in a different order as long as we use updated values for grid points frequently enough, a technique called the Jacobi method where we don't use updated values from the current iteration for any grid points but always use the values as they were at the end of the previous iteration. Using the sequential program as a starting point, develop a parallel version of the Jacobi method using an element-based decomposition strategy where each GPU thread is responsible for processing a single grid element.

The program provided to you accepts no arguments. It creates a randomly initialized grid of N N elements and applies the update rule to each element within the grid until the specified convergence criteria is satisfied. The solution provided by the GPU is compared to that generated by the CPU by printing out the relevant statistics.

Answer the following questions.

1. Edit the compute on device() function in the file solver.cu and the solver kernel naive() function in solver kernel.cu file to complete the functi- onality of the equation solver on the GPU using only global memory.

2. Improve the performance of the kernel developed in the previous step by using shared memory on the GPU. Edit the kernel function solver kernel optimized() in the solver kernel.cu file to complete the functionality.

3. Upload all of the files needed to run your code on BBLearn as a single zip file. Submit a short report describing: (1) the design of your kernels using code or pseudocode to clarify the discussion; (2) the speedup obtained over the serial version for both the naive and optimized kernels, for grid sizes of 2048 2048, 4096 4096, and 8192 8192; and (3) sensitivity of your kernels to thread-block size in terms of the execution time.

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Improve the performance of the kernel developed in the
Reference No:- TGS02700299

Expected delivery within 24 Hours