CS 258 Course Materials

Readings and Lecture Slides

Fundamentals and Introduction

Chapter 1 : Fundamentals. Reading for lectures 1,2,3. Lecture 1 : Why Parallel Architecture. 1/18/95

Lecture 2 and 3 : Evolution of Parallel Machines. 1/23/95 and 1/25/95

Parallel Software Basics

Chapter 2A: Parallel Software Basics, part A. Reading for lectures 4,5.

Lecture 4 : Parallel Software Basics. 2/1/95

Lecture 5 : Programming for Performance. 2/3/95

Scaling Parallel Programs for Multiprocessors: Methodology and Examples. Read for lecture 6.

12 Ways to Fool the Masses When Giving Performance Results on Parallel Computers, D. H. Bailey.

Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors

Lecture 6a : Towards Workload-Driven Architectural Evaluation: Scaling

Lecture 6b: Scaling Applications and Machines. (Defered to 2/10/95).

NAS Parallel Benchmark Results

Methodological Considerations and Characterization of the SPLASH-2 Parallel Applications Suite

Architectural Requirements of Parallel Scientific Applications with Explicit Communication, Cypher, Ho, Konstantinidou, and Messina, ISCA 93. To be handed out in class.

ParkBench Public International Benchmarks for Parallel Computers, Section 1-5. ( Full 5 MB version).

Lecture 7a: Reflections on Publis hed Results (2/10/95)

Lecture 7b : Picking Parameters and Analyzing Sensitivity

Lecture 8 : Choosing Metrics and Presenting Results (2/14/95).

Small-Scale Shared Memory

Chapter 3 : Small Scale Shared-Memory. Reading for Lectures 9, 10

Lecture 9 : Small-Scale Shared Memory (2/17/05).

Lecture 10 : Small-Scale Shared Memory Design Tradeoffs.

Lecture 11 : Small-Scale Shared Memory Implementation.

Lecture 12 : Small-Scale Shared Memory Implementation (cont).

Large-Scale Distributed-Memory Multiprocessors

Chapter 4A : Large Scale Distributed Memory Multiprocessors, Part A. Reading for Lecture 13

Lecture 13 : Realizing Programming Models on Large-Scale Distributed-Memory Multiprocessors.

Lecture 14 : Desing of Large-Scale Distributed-Memory Multiprocessors: Part 1.

Active Messages: a Mechanism for Integrated Communication and Computation, ISCA92

Lecture 15 : Desing of Large-Scale Distributed-Memory Multiprocessors: Part 2.

Intel Paragon

Experience with Active Messages on the Meiko CS-2, Schauser and Scheiman, IPPS 95

Lecture 16 : Desing of Large-Scale Shared Physical Address Space

T3D

Large-Scale Shared Address Space Multiprocessors

Chapter 5A : Large Scale Shared Address Space Multiprocessors, Part A. Reading for Lecture 17

Lecture 17 : Memory Consistency Models

Lecture 18: TNET

Lecture 19 : Large-scale CC Designs

Lecture 20: Project Discussion

Lecture 21 : Case Studies: Large Scale CC-NUMA Machines

The DASH Prototype: Implementation and Performance. In Proceedings of the 19th International Symposium on Computer Architecture, pages 92-103, Gold Coast, Australia, May 1992. © 1992 by the ACM. Available: compressed PostScript

Latency Tolerance

Lecture 22: Latency Tolerance

Performance Evaluation of Memory Consistency Models for Shared Memory Multiprocessors. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 245-257, April 1991. © 1991 by the ACM. Available: compressed PostScript (124 kB).
Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors. Technical Report CSL-TR-92-523, Computer Systems Laboratory, Stanford University, May 1992. Available: compressed PostScript (186 kB) and Stanford Elib distribution (text + images, bibliography & abstract).

Scalable Interconnection Networks

Lecture 23 Design Space of Interconnection Networks

Lecture 24 Routing

Synchronization

Lecture 25 Synchronization

"Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," Mellor-Curmmey and Scott, ACM TOCS, v. 9, no. 1, Feb. 1991, pp 21-65

"Synchronization Algorithms for Shared Memory Multiprocessors", Graunke and Thakker, IEEE Computer, v. 23, no. 6, jun. 1990.

Reactive Synchronization Algorithms for Multiprocessors , Beng-Hong Lim and Anant Agarwal Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 25-35, October 1994.

Handouts

Handout 1 : Course Information

Handout 2 : Assignment 1, due 2/3.

FINAL PROJECTS

Final Project Schedule and Presentations

Micro-benchmarking GAM Lok T. Liu, Rich Martin, Chad Yoshikawa
Fault Tolerance and Recovery in a 1000 Node NOW Wendy Heffner, Jeff Forbes
Toward Designing and Evaluating Network Interface Support: A Case Study on the Paragon Arvind Krishnamurthy, Jeanna Neefe, Randy Wang
The TERA Project Windsor Hsu, Xi Jiang, Giao Thanh Nguyen
Parallel MPEG-1 Encoder Siddhartha Devadhar, Cedric Krumbein, Kim Man Liu
Towards Process Management on a Network of Workstation Remzi Arpaci, Andrea Dusseau, and Amin Vahdat
Characterization of a Split-C Barnes-Hut Implementation on the CM-5 Eric Anderson , Todd Hodes
Distributing Kernel Data Structures in a NOW Steve Rodrigues, Douglas Ghormley
MyriNOW: Design Parameters and Issues for a Myrinet Network for NOW, Alan Mainwaring

MINI PROJECTS (in order of submission, but not necessarily completion)

Parallel MPEG-1 Encoder Siddhartha Devadhar, Cedric Krumbein, Kim Man Liu
Architectural Character of pico-Ray Rich Martin, Lok T. Liu, Chad Yoshikawa
Characterization of a Split-C Barnes-Hut Implementation on the CM-5, Eric Anderson , Todd Hodes , Patrick Delano
Going Beyond Binary Windsor Hsu, Giao Thanh Nguyen, Xi Jiang
An Analysis of the Parkbench Parallel Benchmark Suite Douglas Ghormley, Steven Rodrigues, and Amin Vahdat
Alan, Andrea, and Remzi's Nano-Project
The Blind Men and the Elephant Arvind Krishnamurthy, Jeanna Neefe, and Randy Wang

Other places to go look on the net.

NAS Applied Research

PARKBENCH (PARallel Kerenels and BENCHmarks)

David Walker's Benchmarks hop-off.

Stanford FLASH Project, including the Wisconsin Wind Tunnel

MIT Computer Architecture Group Home Page

MIT Computation Structures Group

WWW Computer Architecture Home Page

Large Scale Parallel Computers