CS 258 Course Materials
Readings and Lecture Slides
Fundamentals and Introduction
Chapter 1 : Fundamentals. Reading for lectures 1,2,3.
Lecture 1 : Why Parallel Architecture. 1/18/95
Lecture 2 and 3 : Evolution of Parallel Machines. 1/23/95 and 1/25/95
Parallel Software Basics
Chapter 2A: Parallel Software Basics, part A. Reading for lectures 4,5.
Lecture 4 : Parallel Software Basics. 2/1/95
Lecture 5 : Programming for Performance. 2/3/95
Scaling Parallel Programs for Multiprocessors: Methodology and Examples. Read for lecture 6.
12 Ways to Fool the Masses When
Giving Performance Results on Parallel Computers, D. H. Bailey.
Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors
Lecture 6a : Towards Workload-Driven Architectural Evaluation: Scaling
Lecture 6b: Scaling Applications and Machines. (Defered to 2/10/95).
NAS Parallel Benchmark Results
Methodological Considerations and
Characterization of the SPLASH-2 Parallel Applications Suite
Architectural Requirements of Parallel Scientific Applications with
Explicit Communication, Cypher, Ho, Konstantinidou, and Messina, ISCA 93. To be handed out in class.
ParkBench Public International Benchmarks for
Parallel Computers, Section 1-5.
( Full 5 MB version).
Lecture 7a: Reflections on Publis
hed Results (2/10/95)
Lecture 7b : Picking Parameters and Analyzing Sensitivity
Lecture 8 : Choosing Metrics and Presenting Results (2/14/95).
Small-Scale Shared Memory
Chapter 3 : Small Scale Shared-Memory. Reading for Lectures 9, 10
Lecture 9 : Small-Scale Shared Memory (2/17/05).
Lecture 10 : Small-Scale Shared Memory Design Tradeoffs.
Lecture 11 : Small-Scale Shared Memory Implementation.
Lecture 12 : Small-Scale Shared Memory Implementation (cont).
Large-Scale Distributed-Memory Multiprocessors
Chapter 4A : Large Scale Distributed Memory Multiprocessors, Part A. Reading for Lecture 13
Lecture 13 : Realizing Programming Models on Large-Scale Distributed-Memory Multiprocessors.
Lecture 14 : Desing of Large-Scale Distributed-Memory Multiprocessors: Part 1.
Active Messages: a Mechanism for Integrated Communication and Computation, ISCA92
Lecture 15 : Desing of Large-Scale Distributed-Memory Multiprocessors: Part 2.
Intel Paragon
Experience with Active Messages on the Meiko CS-2, Schauser and Scheiman, IPPS 95
Lecture 16 : Desing of Large-Scale Shared Physical Address Space
T3D
Large-Scale Shared Address Space Multiprocessors
Chapter 5A : Large Scale Shared Address Space Multiprocessors, Part A. Reading for Lecture 17
Lecture 17 : Memory Consistency Models
Lecture 18: TNET
Lecture 19 : Large-scale CC Designs
Lecture 20: Project Discussion
Lecture 21 : Case Studies: Large Scale CC-NUMA Machines
- The DASH Prototype: Implementation and Performance. In Proceedings of the 19th International Symposium on Computer Architecture, pages 92-103, Gold Coast, Australia, May 1992. © 1992 by the ACM. Available: compressed PostScript
Latency Tolerance
Lecture 22: Latency Tolerance
- Performance Evaluation of Memory Consistency Models for Shared Memory Multiprocessors. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 245-257, April 1991. © 1991 by the ACM. Available: compressed PostScript (124 kB).
- Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors. Technical Report CSL-TR-92-523, Computer Systems Laboratory, Stanford University, May 1992. Available: compressed PostScript (186 kB) and Stanford Elib distribution (text + images, bibliography & abstract).
Scalable Interconnection Networks
Lecture 23 Design Space of Interconnection Networks
Lecture 24 Routing
Synchronization
Lecture 25 Synchronization
"Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," Mellor-Curmmey and Scott, ACM TOCS, v. 9, no. 1, Feb. 1991, pp 21-65
"Synchronization Algorithms for Shared Memory Multiprocessors", Graunke and Thakker, IEEE Computer, v. 23, no. 6, jun. 1990.
Reactive Synchronization Algorithms for Multiprocessors ,
Beng-Hong Lim and Anant Agarwal
Proceedings of the Sixth International Conference on
Architectural Support for Programming Languages and Operating Systems
(ASPLOS VI), pages 25-35, October 1994.
Handouts
Handout 1 : Course Information
Handout 2 : Assignment 1, due 2/3.
FINAL PROJECTS
Final Project Schedule and Presentations
-
Micro-benchmarking GAM
Lok T. Liu, Rich Martin, Chad Yoshikawa
-
Fault Tolerance and Recovery in a 1000 Node NOW
Wendy Heffner, Jeff Forbes
-
Toward Designing and Evaluating Network Interface
Support: A Case Study on the Paragon
Arvind Krishnamurthy, Jeanna Neefe, Randy Wang
-
The TERA Project
Windsor Hsu, Xi Jiang, Giao Thanh Nguyen
-
Parallel MPEG-1 Encoder
Siddhartha Devadhar, Cedric Krumbein, Kim Man Liu
-
Towards Process Management on a Network of Workstation
Remzi Arpaci, Andrea Dusseau, and Amin Vahdat
-
Characterization of a Split-C Barnes-Hut Implementation on the CM-5
Eric Anderson , Todd Hodes
-
Distributing Kernel Data Structures in a NOW
Steve Rodrigues, Douglas Ghormley
- MyriNOW: Design Parameters and Issues for a Myrinet Network for NOW,
Alan Mainwaring
MINI PROJECTS (in order of submission, but not necessarily completion)
-
Parallel MPEG-1 Encoder
Siddhartha Devadhar, Cedric Krumbein, Kim Man Liu
-
Architectural Character of pico-Ray
Rich Martin, Lok T. Liu, Chad Yoshikawa
- Characterization of a Split-C Barnes-Hut Implementation on the CM-5,
Eric Anderson , Todd Hodes ,
Patrick Delano
-
Going Beyond Binary
Windsor Hsu, Giao Thanh Nguyen, Xi Jiang
-
An Analysis of the Parkbench Parallel Benchmark Suite
Douglas Ghormley, Steven Rodrigues, and Amin Vahdat
-
Alan, Andrea, and Remzi's Nano-Project
-
The Blind Men and the Elephant
Arvind Krishnamurthy, Jeanna Neefe, and Randy Wang
Other places to go look on the net.
NAS Applied Research
PARKBENCH (PARallel Kerenels and BENCHmarks)
David Walker's Benchmarks hop-off.
Stanford FLASH Project, including the Wisconsin Wind Tunnel
MIT Computer Architecture Group Home Page
MIT Computation Structures Group
WWW Computer Architecture Home Page
Large Scale Parallel Computers