Cs 475575 -- spring - create a table and a graph showing


Vectorized Array Multiplication and Reduction using SSE

Introduction

There are many problems in scientific and engineering computing where you want to multiply arrays of numbers (matrix manipulation, Fourier transformation, convolution, etc.).

This project is in two parts. The first part is to test array multiplication, SIMD and non-SIMD. The second part is to test array multiplication and reduction, SIMD and non-SIMD.

Use the gcc or g++ compilers for both parts, but... Because simd.p5.cpp uses assembly language, this code is not portable. I know for sure it works on flip, using gcc/g++ 4.8.5. You are welcome to try it other places, but there are no guarantees. It doesn't work on rabbit. Do not use "-O3".

Requirements

1. Use the supplied SIMD SSE code to run an array multiplication timing experiment. Run the same experiment a second time using your own C/C++ array multiplication code.

2. Use the supplied SIMD SSE code to run an array multiplication + reduction timing experiment. Run the same experiment a second time using your own C/C++ array multiplication + reduction code.

3. Use different array sizes from 1K to 32M. The choice of in-between values is up to you, but pick something that will make for a good graph.

4. Feel free to run each array-size test a certain number of trials if you want. Use the peak value for the performance you record. Check peak versus average performance to be sure you are getting consistent answers. Try it again if the peak and average are not within, say, 20% of each other.

5. Create a table and a graph showing SSE/Non-SSE speed-up as a function of array size. Note: this is not a multithreading assignment, so you don't need to worry about a NUMT. Speedup in this case will be S = Psse/Pnon-see = Tnon-sse/Tsse (P = Performance, T = Elapsed Time). Plot both curves on the same set of axes.

6. The Y-axis performance units in this case will be "Speed-Up", i.e., dimensionless.

7. Be sure that the graphs are plotted so that "up" means "faster".

8. Your commentary write-up (turned in as a PDF file) should tell:
1. What machine you ran this on
2. Show the table and graph
3. What patterns are you seeing in the speedups?
4. Are they consistent across a variety of array sizes?
5. Why or why not, do you think?
6. Knowing that SSE SIMD is 4-floats-at-a-time, why could you get a speed-up of < 4.0 or
> 4.0 in the array mutiplication?
7. Knowing that SSE SIMD is 4-floats-at-a-time, why could you get a speed-up of < 4.0 or
> 4.0 in the array mutiplication-reduction?


Attachment:- project file.rar

Solution Preview :

Prepared by a verified Expert
Dissertation: Cs 475575 -- spring - create a table and a graph showing
Reference No:- TGS02291653

Now Priced at $50 (50% Discount)

Recommended (91%)

Rated (4.3/5)