next up previous
Next: Register (L0) Parameter Search Up: Optimizing Matrix Multiply using Previous: Matrix Multiply Code

Matrix Multiply Search Scripts

 

The search script take parameters describing the machine architecture, including the number of integer and floating-point registers and the sizes of each level of cache. For each combination of generator parameters and compilation options, the matrix multiply search script calls the generator, compiles the resulting routine, links it with timing code, and benchmarks the resulting executable.

To produce a complete BLAS GEMM routine, we find separate parameters for each of the three cases tex2html_wrap_inline2029 , tex2html_wrap_inline2031 , and tex2html_wrap_inline2033 ( tex2html_wrap_inline2035 has code identical to tex2html_wrap_inline2029 ). For each case, we first find the best register (or L0) parameters for in-L1-cache matrices, then find the best L1 parameters for in-L2-cache matrices, etc. While this strategy is not guaranteed to find the best L0 core for out-of-L1-cache matrices, the resulting cores have performed well in practice.





Richard Vuduc
Tue Nov 18 15:58:12 PST 1997