next up previous
Next: Results Up: Matrix Multiply Search Scripts Previous: Register (L0) Parameter Search

Cache Block Search

 

We perform the L1 cache blocking search after the best register blocking is known. We would like to make the L1 blocks large to increase data reuse but larger L1 blocks increase the probability of cache conflicts [LRW91]. Tradeoffs between M- and N- loop overheads, memory access patterns, and TLB structure also affect the best L1 size. We currently perform a relatively simple search of the L1 parameter space. For the D tex2html_wrap_inline1931 D square case, we search the neighborhood centered at tex2html_wrap_inline2065 where L1 is the L1 cache size in elements. We set tex2html_wrap_inline1953 to the values tex2html_wrap_inline2071 where tex2html_wrap_inline2073 and tex2html_wrap_inline2075 . tex2html_wrap_inline1955 and tex2html_wrap_inline1957 are set similarly. We benchmark the resulting 125 combinations with matrix sizes that either fit in L2 cache, or are within some upper bound if no L2 cache exists. The L2 cache blocking search, when necessary, is performed in a similar way.



Richard Vuduc
Tue Nov 18 15:58:12 PST 1997