In this  problem, we will compare the performance of a vector processor with a  hybrid system that contains a scalar processor and a GPU-based  coprocessor. In the hybrid system, the host processor has superior  scalar performance to the GPU, so in this case all scalar code is  executed on the host processor while all vector code is executed on the  GPU. We will refer to the first system as the vector computer and the  second system as the hybrid computer. Assume that your target  application contains a vector kernel with an arithmetic intensity of 0.5  FLOPs per DRAM byte accessed; however,
 The application also has a scalar component which that must be performed  before and after the kernel in order to prepare the input vectors and  output vectors, respectively. For a sample dataset, the scalar portion  of the code requires 400 ms of execution time on both the vector  processor and the host processor in the hybrid system. The kernel reads  input vectors consisting of 200 MB of data and has output data  consisting of 100 MB of data. The vector processor has a peak memory  bandwidth of 30 GB/sec and the GPU has a peak memory bandwidth of 150  GB/sec. The hybrid system has an additional overhead that requires all  input vectors to be transferred between the host memory and GPU local  memory before and after the kernel is invoked. The hybrid system has a  direct memory access (DMA) bandwidth of 10 GB/sec and an average latency  of 10 ms. Assume that both the vector processor and GPU are performance  bound by memory bandwidth. Compute the execution time required by both  computers for this application?