Next: About this document
Up: Optimizing Matrix Multiply using
Previous: StatusAvailability, and Future
References
- ABB
92 -
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz,
A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen.
LAPACK users' guide, release 1.0.
In SIAM, Philadelphia, 1992.
- ACF95
-
B. Alpern, L. Carter, and J. Ferrante.
Space-limited procedures: A methodology for portable
high-performance.
In International Working Conference on Massively Parallel
Programming Models, 1995.
- AGZ94
-
R. Agarwal, F. Gustavson, and M. Zubair.
IBM Engineering and Scientific Subroutine Library, Guide and
Reference, 1994.
Available through IBM branch offices.
- BAD
-
J. Bilmes, K. Asanovic, J. Demmel, D. Lam, and C.W. Chin.
The PHiPAC WWW home page.
http://www.icsi.
[0]berkeley.edu/~bilmes/phipac.
- BAD
96 -
J. Bilmes, K. Asanovic, J. Demmel, D. Lam, and C.W. Chin.
PHiPAC: A portable, high-performance, ANSI C coding methodology
and its application to matrix multiply.
LAPACK working note 111, University of Tennessee, 1996.
- BLL93
-
B.Kågström, P. Ling, and C. Van Loan.
Portable high performance GEMM-based level 3 BLAS.
In R.F. Sincovec et al., editor, Parallel Processing for
Scientific Computing, pages 339-346, Philadelphia, 1993. SIAM Publications.
- BLS91
-
D. H. Bailey, K. Lee, and H. D. Simon.
Using Strassen's algorithm to accelerate the solution of linear
systems.
J. Supercomputing, 4:97-371, 1991.
- CDD
96 -
J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet,
K. Stanley, D. Walker, and R.C. Whaley.
ScaLAPAC: A portable linear algebra library for distributed memory
computers - design issues and performance.
LAPACK working note 95, University of Tennessee, 1996.
- CFH95
-
L. Carter, J. Ferrante, and S. Flynn Hummel.
Hierarchical tiling for improved superscalar performance.
In International Parallel Processing Symposium, April 1995.
- DCDH90
-
J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling.
A set of level 3 basic linear algebra subprograms.
ACM Trans. Math. Soft., 16(1):1-17, March 1990.
- DCHH88
-
J. Dongarra, J. Du Cros, S. Hammarling, and R.J. Hanson.
An extended set of FORTRAN basic linear algebra subroutines.
ACM Trans. Math. Soft., 14:1-17, March 1988.
- GL89
-
G.H. Golub and C.F. Van Loan.
Matrix Computations.
Johns Hopkins University Press, 1989.
- KHM94
-
C. Kamath, R. Ho, and D.P. Manley.
DXML: A high-performance scientific subroutine library.
Digital Technical Journal, 6(3):44-56, Summer 1994.
- LHKK79
-
C. Lawson, R. Hanson, D. Kincaid, and F. Krogh.
Basic linear algebra subprograms for FORTRAN usage.
ACM Trans. Math. Soft., 5:308-323, 1979.
- LRW91
-
M. S. Lam, E. E. Rothberg, and M. E. Wolf.
The cache performance and optimizations of blocked algorithms.
In Proceedings of ASPLOS IV, pages 63-74, April 1991.
- MS95
-
J.D. McCalpin and M. Smotherman.
Automatic benchmark generation for cache optimization of matrix
algorithms.
In R. Geist and S. Junkins, editors, Proceedings of the 33rd
Annual Southeast Conference, pages 195-204. ACM, March 1995.
- SMP
96 -
R. Saavedra, W. Mao, D. Park, J. Chame, and S. Moon.
The combined effectiveness of unimodular transformations, tiling, and
software prefetching.
In Proceedings of the 10th International Parallel Processing
Symposium, April 15-19 1996.
- WL91
-
M. E. Wolf and M. S. Lam.
A data locality optimizing algorithm.
In Proceedings of the ACM SIGPLAN'91 Conference on Programming
Language Design and Implementation, pages 30-44, June 1991.
- Wol96
-
M. Wolfe.
High performance compilers for parallel computing.
Addison-Wesley, 1996.
Richard Vuduc
Tue Nov 18 15:58:12 PST 1997