CS267: Lecture 2

Memory Hierarchies

August 29, 2001

Lecturer: Kathy Yelick

Abstract

We study the structure and performance properties of modern processors, with special attention on their memory hierarchies.  We describe a type of memory benchmark that can be used ot expose performance feature of the memory hierarchy, and look at some examples from specific machines.  We discuss optimizations techniques for uniprocessors, especially cache and register blocking (also called tiling). Matrix multiply is a running example in the lecture.  

2001 Lecture Notes

PowerPoint, Postscript, PDF

Readings

  • 1996 Lecture Notes (in html): Part 1, Part 2: Discuss the IBM RS6000 processor
  • International Conference on Computational Science, Special Session on Automatic Performance Tuning
  • "Empirical Evaluation of the Cray T3D - A Compiler Perspective" (explains the memory benchmark plot used in lecture)
  • BeBOP Homepage
  • ATLAS Homepage
  • BLAS (Basic Linear Algebra Subroutines), Reference for (unoptimized) implementations of the BLAS, with documentation.
  • LAPACK (Linear Algebra PACKage), a standard linear algebra library optimized to use the BLAS effectively on uniprocessors and shared memory machines (software, documentation and reports)
  • ScaLAPACK (Scalable LAPACK), a parallel version of LAPACK for distributed memory machines (software, documentation and reports)
  • Assignments

    Assignment 1 (due 9/19/01). We have assigned "multidisciplinary" teams of 2-3 students for this assignment. If you are not in a team, please contact David Bindel (dbindel@cs).