next up previous
Next: References Up: Optimizing Matrix Multiply using Previous: Results

Status, Availability, and Future Work

 

This paper has demonstrated our ability to write portable, high performance ANSI C code for matrix multiply using parameterized code generators and a timing-driven search strategy.

The PHiPAC alpha release contains the matrix multiply generator, the naive search scripts written in perl, and our timing libraries. We have created a Web site from which the alpha release is available and on which we plan to list blocking parameters for many systems [BAD tex2html_wrap_inline1913 ]. We are currently working on a better L1 blocking strategy and accompanying methods for search based on various criteria [LRW91]. The PHiPAC GEMM can be used with Bo Kågström's GEMM-based BLAS3 package [BLL93] and LAPACK [ABB tex2html_wrap_inline1913 92].

We have also written parameterized generators for matrix-vector and vector-matrix multiply, dot product, AXPY, convolution, and outer-product, and further generators, such as for FFT, are planned.

We wish to thank Ed Rothberg of SGI for help obtaining the R8K and R10K performance plots. We also wish to thank Nelson Morgan who provided initial impetus for this project and Dominic Lam for work on the initial search scripts.



Richard Vuduc
Tue Nov 18 15:58:12 PST 1997