Automatic Performance Tuning (Bebop)
- Optimization and Performance Modeling of Stencil
Computations on Modern Microprocessors
- (SIAM Review, December 2008)
- Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John
Shalf, and Katherine Yelick
- PDF (2.8 MB)
- Avoiding Communication in Sparse Matrix Computations
- (IEEE International Parallel and
Distributed Processing Symposium, April 2008)
- James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine Yelick
- PDF (1 MB)
- Talk slides: PDF (6.2
MB)
- Lattice Boltzmann Simulation Optimization on
Leading Multicore Platforms
- (IEEE International Parallel and
Distributed Processing Symposium, April 2008) [Winner, Best
Paper for Applications track]
- Samuel Williams, Jonathan Carter, Leonid Oliker, John Shalf, and
Katherine Yelick
- PDF (560k)
- Talk slides: PDF (10.4 MB)
| PPT (2.6 MB)
- Optimization of Sparse Matrix-Vector Multiplication
on Emerging Multicore Platforms
- (Supercomputing 2007, November 2007)
- Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine
Yelick, and James Demmel
- PDF (438k)
- Talk slides: PDF (6.4 MB)
| PPT (2.5 MB)
- Avoiding Communication in Computing Krylov Subspaces
- (UCB/EECS-2007-123, October 2007)
- James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine Yelick
- PDF
(34.9 MB)
- Scientific Computing Kernels on the Cell
Processor
- (International
Journal of Parallel Programming, April 2007)
- Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry
Husbands, and Katherine Yelick
- PDF (376k)
- When Cache Blocking Sparse Matrix Vector Multiply Works
and Why
-
(Applicable Algebra in Engineering, Communication and Computing, March 2007)
- Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, and Katherine Yelick
- PDF (390k)
- Benchmarking Sparse Matrix-Vector Multiply in Five Minutes
- (SPEC Benchmark Workshop 2007, Austin, TX, January 2007)
- Hormozd Gahvari, Mark Hoemmen, James Demmel, Katherine Yelick
- PDF (1 MB)
- Talk slides: PPT (6.4 MB)
- Implicit and Explict Optimizations for Stencil
Computations
- (Memory Systems Performance and Correctness, San Jose, California, USA, October 2006)
- Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John
Shalf, and Katherine Yelick
- PDF (604k)
- Talk slides: PDF (3.2 MB)
- The Potential of the Cell Processor for Scientific
Computing
- (Computing Frontiers 2006, Ischia, Italy, May
2006)
- Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry
Husbands, and Katherine Yelick
- PDF (216k)
- OSKI: A library of automatically tuned sparse matrix kernels
- (Proceedings of SciDAC 2005, Journal of Physics: Conference Series, June 2005)
- Richard Vuduc, James Demmel, and Katherine Yelick.
- PDF (190k)
- Self-Adapting Linear Algebra Algorithms and Software
- (Proceedings of the IEEE, Special
Issue on Program Generation, Optimization, and Adaptation,
93(2), February 2005)
- James Demmel, Jack Dongarra, Victor Eijkhout, Erika Fuentes, Antoine
Petitet, Richard Vuduc, R. Clint Whaley, and Katherine Yelick.
- PDF (600k)
- Performance Models for Evaluation and Automatic
Tuning of Symmetric Sparse Matrix-Vector Multiply
- (International Conference on Parallel Processing, Montreal,
Quebec, Canada, August 2004) [Winner, Best Paper Award]
- Benjamin C. Lee, Richard Vuduc, James Demmel, and Katherine Yelick.
- PDF (178k) | Gzip'd PostScript (204k)
- Talk slides: PDF (540k)
- Toward automatic performance tuning of matrix
triple products based on matrix structure
- (PARA'04 Workshop on State-of-the-art in Scientific Computing, Copenhagen, Denmark, June 2004.)
- Eun-Jin Im, Ismail Bustany, Cleve Ashcraft, James Demmel, and Katherine Yelick.
- Performance Modeling and Analysis of Cache
Blocking in Sparse Matrix Vector Multiply
- (UCB/CSD-04-1335, June 2004)
- Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, and Katherine A. Yelick
-
- PDF (~8MB)
- SPARSITY: An Optimization Framework for Sparse Matrix Kernels
- (International Journal of High Performance Computing Applications, 18 (1), pp. 135-158, February 2004)
- Eun-Jin Im, Katherine A. Yelick, and Richard Vuduc.
- PDF (1.1M)
| Gzip'd PostScript (1.2M)
- Performance Optimizations and Bounds for
Sparse Symmetric Matrix-Multiple Vector Multiply
- (UCB/CSD-03-1297, November 2003)
- Benjamin C. Lee, Richard W. Vuduc, James W. Demmel, Katherine
A. Yelick, Michael de Lorimier, and Lijue Zhong.
- PDF (867k) | Gzip'd PostScript (1.3M)
- Memory Hierarchy Optimizations and Performance Bounds for Sparse ATA*x
- (ICCS 2003: Workshop on Parallel Linear Algebra, Melbourne, Australia, June 2003)
- Richard Vuduc, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick.
- Abstract
| PDF (328k)
| Gzip'd PostScript (91k)
Talk slides, PDF (735k)
| Talk slides, gzip'd PostScript, 4-up (138k)
- Extended version:
U.C. Berkeley Technical Report UCB/CS-03-1232
-
Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply
- Richard Vuduc, James W. Demmel, Katherine A. Yelick,
Shoaib Kamil, Rajesh Nishtala, Benjamin Lee.
- SC 2002 (High Performance
Networking and Computing, commonly called "Supercomputing").
Baltimore, November 2002.
- Available in pdf (834k)
| Gzip'd PostScript (2.7M)
-
Automatic Performance Tuning and Analysis of Sparse
Triangular Solve
- Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala,
James W. Demmel, Katherine A. Yelick.
- ICS 2002:
Workshop
on Performance Optimization via High-Level Languages and
Libraries. New York, June 22-26, 2002.
- Available in pdf (548k)
| Gzip'd PostScript (1.2M)
-
Optimizing Sparse Matrix-Vector Multiplication for Register Reuse
- E. Im and K. A. Yelick
- International Conference on Computational Science, San Francisco,
California, May 2001.
- (Postscript)
-
Optimizing Sparse Matrix Kernels for Data Mining
- E. Im and K. A. Yelick
- Proceedings of the Text Mine Workshop
- Chicago, IL, April 2001.
- (Postscript)
-
Optimizing Sparse Matrix Vector Multiplication on SMPs
- E. Im and K. A. Yelick
- SIAM Conf. Parallel Processing for Scientific Computing, San
Antonio, TX, March 1999.
- (Postscript)
-
Model-based Memory Hierarchy Optimizations for Sparse Matrices
- E. Im and K. A. Yelick
- Workshop on Profile and Feedback-Directed Compilation, Paris,
France, October 1998.
- (Postscript)
Intelligent RAM (IRAM)
-
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines,
- Brian R. Gaeke, Parry Husbands, Xiaoye S. Li, Leonid Oliker,
Katherine A. Yelick, and Rupak Biswas.
- Proceedings of the International Parallel and Distributed Processing
Symposium (IPDPS). Ft. Lauderdale, FL.
- April, 2002.
- Available in PDF.
-
Hardware/Compiler Co-development for an Embedded Media Processor,
- C. Kozyrakis, D. Judd, J. Gebis, S. Williams, D. Patterson, K. Yelick,
- Proceedings of the IEEE, vol. 89, no. 11, November 2001 (p. 1694-709).
- Draft available in PDF.
-
Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler,
- D. Judd, K. Yelick, C. Kozyraki, D. Martin, and D. Patterson,
- Second Workshop on Intelligent Memory Systems, Cambridge,
November 2000.
- Available in Postscript.
-
Performance Analysis of an H.263 Video Encoder on VIRAM,
- T. Nguyen, A. Zakhor and K. Yelick
- International Conference on Image Processing (ICIP),
- Vancouver, B.C., Canada, September 2000.
- Available in PDF
-
Efficient FFTs on IRAM
- Thomas, R. and Yelick, K.
- First Workshop on Media Processors and DSPs,
November 15, 1999.
- Postscript available.
-
Scalable processors in the billion-transistor era: IRAM
- Kozyrakis, C.E., Perissakis, S., Patterson, D., Anderson, T.,
Asanovic, K., Cardwell, N., Fromm, R., Golbus, J., Gribstad, B.,
Keeton, K., Thomas, R., Treuhaft, N., Yelick, K.
- Computer, vol.30, (no.9), IEEE Comput. Soc,
Sept. 1997. p.75-8.
- Available in PDF.
-
The Energy Efficiency of IRAM Architectures
- R. Fromm, S. Perissakis, N. Cardwell, D. Patterson, T. Anderson,
and K. Yelick
- Proceedings of the 24th Annual International Conference on
Computer Architecture, June 1997.
- Available in Postscript.
-
A Case for Intelligent DRAM: IRAM
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton,
C. Kozyrakis, R. Thomas, and K. Yelick.
- IEEE Micro, April 1997, pp. 34-44.
Also appeared as an Award Paper, Hot Chips VIII , August 1996.
- Available in PDF or
Postscript.
-
Intelligent RAM (IRAM): Chips that remember and compute
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton,
C. Kozyrakis, R. Thomas, and K. Yelick.
- Proceedings of the 1997 IEEE International Solid-State Circuits
Conference, February 1997, pp. 224-225.
- Available in PDF or
Postscript.
Clusters (includes ISTORE and ROC)
-
ROC-1: Hardware Support for Recovery-Oriented Computing.
- Oppenheimer, D., A. Brown, J. Beck, D. Hettena, J. Kuroda,
N. Treuhaft, D.A. Patterson, and K. Yelick.
- IEEE Transactions on Computers Special Issue on
Embedded Fault-Tolerant Computer Systems, Jul.-Aug., 2001.
- Available in PDF
-
Cluster I/O with River: Making the Fast Case Common
- R. H. Arpaci-Dusseau, E. Anderson, N. Treuhaft, D. E. Culler,
J. M. Hellerstein, D. A. Patterson, and K. A. Yelick
- Workshop on I/O in Parallel and Distributed Systems, Atlanta, GA,
May 1999.
- Postscript available.
Parallel Applications
-
Performance Modeling and Composition: A Case Study in
Cell Simulation
- Seve G. Steinberg, Jun Yang, and Katherine Yelick, IPPS '96
April 1996.
- Abstract,
Postscript available.
-
Parallelizing the Phylogeny Problem
- J. Jones and K. Yelick, Supercomputing '95
December 1995.
- Abstract,
Postscript available.
-
Connected Components on Distributed Memory Machines
- A. Krishnamurthy, S. Lumetta, D. Culler, and K. Yelick,
June 1994.
- Abstract,
Postscript available.
-
Parallel Timing Simulation on a Distributed Memory Multiprocessor
- Chih-Po Wen and Katherine Yelick, International Conference on
Computer Aided Design, Santa Clara, California, November 1993.
- Abstract,
Postscript available.
-
Implementing an Irregular Application on a Distributed Memory
Multiprocessor
- Soumen Chakrabarti and Katherine Yelick, ACM Symposium on
Principles and Practice of Parallel Programming, San Diego,
California, June 1993.
- Abstract,
Postscript available.
-
A Parallel Completion Procedure for Term Rewriting Systems
- Katherine Yelick and Stephen J. Garland, Conference on
Automated Deduction , June 1992.
- Abstract,
Postscript available.
Compilation
-
Analyses and Optimizations for Shared Address Space Programs
- A. Krishnamurthy and K. Yelick
- Journal of Parallel and Distributed Computation, 1996.
- Postscript available.
-
Optimizing Parallel Programs with
Explicit Synchronization
- Arvind Krishnamurthy and Katherine Yelick,
Programming Language Design and Implementation,
La Jolla, California, June 1995.
- Abstract,
Postscript available.
-
Optimizing Parallel SPMD Programs
- Arvind Krishnamurthy and Katherine Yelick,
Seventh Annual Workshop on Languages and Compilers for Parallel
Computing, Ithaca, New York, August 1994.
- Abstract,
Postscript available.
-
Compiling Sequential Programs for Speculative Parallelism
- Chih-Po Wen and Katherine Yelick, International Conference on
Parallel and Distributed Systems, National Taiwan University,
Taiwan, December 1993.
- Abstract,
Postscript available.
Scheduling and Load Balancing
-
Models and Scheduling Algorithms for Mixed Data and Task Parallel
Programs
- S. Chakrabarti, J. Demmel, and K. Yelick
- Journal of Parallel and Distributed Computing,
Vol. 47, pp. 168--184. 1997.
-
Modeling the Benefits of Mixed Data and Task Parallelism
- Soumen Chakrabarti, James Demmel, and Katherine Yelick,
Symposium on Parallel Algorithms and Architectures, Santa Barbara,
California, July 1995.
- Abstract,
Postscript available.
-
Randomized Load Balancing for Tree Structured Computation
- Soumen Chakrabarti, Abhiram Ranade, and Katherine Yelick,
IEEE Scalable High Performance Computing Conference, Knoxville,
Tennessee, May 1994.
- Abstract,
Postscript available.
Distributed Data Structures & the Multipol Library
-
Portable Parallel Irregular Applications.
- K. Yelick, C.-P. Wen, S. Chakrabarti, E. Deprit,
J. Jones, A. Krishnamurthy, Workshop on Parallel Symbolic
Languages and Systems, Beaune, France, October 1995.
To appear in Lecture Notes in Computer Science.
- Abstract,
Postscript available.
-
Multipol: A Distributed Data Structure Library.
- S. Chakrabarti, E. Deprit, J. Jones, A. Krishnamurthy,
E.-J. Im, C.-P. Wen, and K. Yelick, UCB//CSD-95-879, July 1995.
- Abstract,
Postscript available.
-
Portable Runtime Support for Asynchronous Simulation
- Chih-Po Wen and Katherine Yelick, International Conference on
Parallel Processing, August 1995.
- Abstract,
Postscript available.
-
Portable Runtime Support for Asynchronous Simulation
- C.-P. Wen, S. Chakrabarti, E. Deprit,
A. Krishnamurthy and K. Yelick,
``Runtime Support for Portable Distributed Data Structures,''
Workshop on Languages, Compilers, and Runtime Systems for
Scalable Computers, May 1995.
- Postscript available.
-
Distributed Data Structures and Algorithms for Gröbner Basis
Computation
- Soumen Chakrabarti and Katherine Yelick, Lisp and Symbolic
Computation, Vol. 7, 1994.
- Abstract available.
-
Data Structures for Irregular Applications
- K. Yelick, S. Chakrabarti, E. Deprit, J. Jones, A. Krishnamurthy,
and C.-P. Wen, DIMACS Workshop on Parallel Algorithms for
Unstructured and Dynamic Problems, Piscataway, New Jersey, June 1993.
- Abstract,
Postscript available.
-
Programming Models for Irregular Applications
- Katherine Yelick.
Workshop on Languages and Compilers and Run-Time Environments
for Distributed Memory Multiprocessors, October 1992.
Also appeared in SIGPLAN Notices, January 1993.
- Postscript available.
-
A Survey of Portable Message Passing Libraries
- Chih-Po Wen and Katherine Yelick, unpublished manuscript,
October 15, 1992.
- Postscript available.
Parallel Languages: Split-C, Titanium, and UPC
-
An Evaluation of Current High Performance Networks,
- C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove,
P. Husbands, C. Iancu, M. Welcome, K. Yelick,
- International Parallel and Distributed Processing Symposium,
Nice, France, April 22-26, 2003.
- Available in PDF
-
Introduction to UPC and Language Specification,
- W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren,
- CCS-TR-99-157, IDA Center for Computing Sciences, 1999.
- Available in PDF
-
Titanium: A High-Performance Java Dialect
- K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A.
Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella,
and A. Aiken
- Concurrency: Practice and Experience,
Vol. 10, No. 11-13, September-November 1998. An earlier version was
presented at the Workshop on Java for High-Performance Network Computing,
Palo Alto, CA, Feb. 1998.
- Postscript available.
-
Empirical Evaluation of Global Memory Support on the Cray-T3D and
Cray-T3E
- A. Krishnamurthy, D. Culler, and K. Yelick
- UCB//CSD-98-991, 1998.
- Postscript available.
-
Evaluation of Architectural Support for Global
Address-Based Communication in Large-Scale Parallel Machines
- Arvind Krishnamurthy, Klaus E. Schauser, Chris Scheiman, Randy Wang,
David Culler, and Katherine Yelick,
Proceedings of Architecture Support on Programming Languages and
Operating Systems, Cambridge, MA, November 1996.
- Postscript available.
-
Empirical Evaluation of the CRAY-T3D: A Compiler Perspective
- Remzi H. Arpaci, David E. Culler, Arvind Krishnamurthy,
Steve G. Steinberg, and Katherine Yelick,
International Symposium on Computer Architecture,
Santa Margherita Ligure, Italy, June 1995.
- Abstract,
Postscript available.
-
Parallel Programming in Split-C
- D. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, S. Lumetta,
T. von Eicken, and K. Yelick, Supercomputing, Portland, Oregon,
November 1993.
- Abstract,
Postscript available.
Symbolic Computation
-
On the Correctness of a Distributed Memory Gröbner Basis
Algorithm
- Soumen Chakrabarti and Katherine Yelick, International Conference
on Rewriting Techniques and Applications, Montreal, Canada, June 1993.
- Abstract,
Postscript available.
-
Compiling Verilog into Finite State Machines
- S.-T. Cheng, R. Brayton, G. York, K. Yelick, A. Saldanha.
International Verilog Conference, 1995.
- Abstract,
Postscript available.
-
Using Abstraction in Explicitly Parallel Programs
- Katherine A. Yelick, MIT Laboratory for Computer Science,
July 1991, TR-507. (Revised from PhD Thesis, December 1990.)
- Abstract,
Postscript available.
-
A Generalized Approach to Equational Unification
- Katherine A. Yelick, MIT Laboratory for Computer Science,
August 1985, TR-344.