Describe what would need to be done in order to replace


a)Consider this loop:
a[0] = 0;
for (i = 1; i < n; i++)
a[i] = a[i-1] + i;
Since the value of a[i] can't be computed without the value of a[i-1],
there's a loop-carried dependence. Determine how this dependence could be
eliminated so that the loop could be parallelized (for example, in OpenMP,
although you do not have to write the OpenMP code)
b)Consider the following portion of CUDA reduction code:
reduce1<<>>(dev_array_orig,dev_array_new);
cudaMemcpy(host_array_new,dev_array_new,sizeof(int)*N,
cudaMemcpyDeviceToHost);
for (i = 0; i < nBlocks.x; i++)
host_array_new[0] += host_array_new[i];
Again, you do not have to write code. Describe what would need to be done in order to replace the for loop with another CUDA kernel call that would implement a ?nal reduction into just one element.

Request for Solution File

Ask an Expert for Answer!!
Basic Computer Science: Describe what would need to be done in order to replace
Reference No:- TGS080840

Expected delivery within 24 Hours