questionsuppose you are designing a hardware


Question

Suppose you are designing a hardware prefetcher for unblocked matrix transposition code above. The simplest type of hardware prefetcher only prefetches in order cache blocks after a miss. More complicated "nonunit stride" hardware prefetchers can analyze a miss reference stream, and detect and prefetch nonunit strides. In compare, software prefetching can determine nonunit strides as easily as it can decide unit strides. Assume prefetches write directly into cache and no pollution (overwriting data that needs to be used before the data that is prefetched).

For best presentation given a nonunit stride prefetcher, in steady state of inner loop, how many prefetches need to be outstanding at a given time?

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: questionsuppose you are designing a hardware
Reference No:- TGS0445001

Expected delivery within 24 Hours