This paper has demonstrated our ability to write portable, high performance ANSI C code for matrix multiply using parameterized code generators and a timing-driven search strategy.
The PHiPAC alpha release contains the matrix multiply generator, the naive search scripts written in perl, and our timing libraries. We have created a Web site from which the alpha release is available and on which we plan to list blocking parameters for many systems [BAD ]. We are currently working on a better L1 blocking strategy and accompanying methods for search based on various criteria [LRW91]. The PHiPAC GEMM can be used with Bo Kågström's GEMM-based BLAS3 package [BLL93] and LAPACK [ABB 92].
We have also written parameterized generators for matrix-vector and vector-matrix multiply, dot product, AXPY, convolution, and outer-product, and further generators, such as for FFT, are planned.
We wish to thank Ed Rothberg of SGI for help obtaining the R8K and R10K performance plots. We also wish to thank Nelson Morgan who provided initial impetus for this project and Dominic Lam for work on the initial search scripts.