CS267 Assignment 3: Conjugate Gradient

Due Friday 21 March 2008 at 11:59pm

[ Introduction | Details | Submission | Resources | FAQ ]

Introduction

The method of conjugate gradients (CG) is an iterative technique for solving symmetric positive-definite linear systems. The conjugate gradient algorithm, popular in practice, is similar in structure to many other linear and nonlinear optimization and equation-solving algorithms, and is relatively simple to code. All these points make CG an attractive benchmark kernel. Indeed, CG appears in both the NAS parallel benchmarks and the SPEC floating point benchmark suite.

Although it is not required you understand CG and why it works to solve a system of equations Ax = b, the underlying principles are quite interesting and, unlike many other scientific algorithms in use, there is a succint, understandable text that explains it without much prerequisite knowledge of math, written by Prof. Jonathan Shewchuk. I highly recommend reading his paper.

In this project, you will parallelize CG using the UPC language, work out the performance bottlenecks, and iteratively optimize your code.

Details

You may work in groups of up to 3. The implementation we give you is capable of solving a simple model problem (the 1-d Poisson equation) or more general sparse matrix problem. The code can read sparse matrix files in the Matrix Market format; for instance

  ./cg gr_30_30.mtx
will solve the 30-by-30 two-dimensional Poisson problem from the Matrix Market file gr_30_30.mtx.

We will provide a serial C implementation with a dummy preconditioner. Your tasks are to:

  1. Create an initial parallel UPC implementation
  2. Analyze the performance of your parallel implmentation. What are the bottlenecks?
  3. Optimize your implementation. Restructure your code, change your parallelization strategy, etc. If you reach a point where you're pretty satisfied with the parallelization, implement a simple preconditioner (Jacobi or SSOR or some other one).
  4. Iterate 2 & 3 until you are either satisfied with the result, or you run out of time.

For this homework assignment, the primary platforms are Jacquard and Franklin, NERSC's new flagship Cray XT4. UPC is installed on both machines; on Jacquard, simply loading the UPC module and using upcc as the compiler works. On Franklin, see here (contains information for both machines).

On Jacquard, make sure you load the ACML module before compiling/running.

Submission

Your group should put together a write-up describing changes you've made to the programs and their effects on performance. Mail me (skamil@cs) a URL to a tar file containing the report, your program code, and any necessary or useful additional information.

The primary questions of interest:

Answer the questions I've asked or implied above, and try to explain any interesting effects you see. If you don't see any, explain why not. Explanations that are based on a well-understood system model (PRAM, LogP, etc.) or well-understood programming models (e.g. comparing shared memory to PGAS to message passing) are the most convincing. The page should include appropriate speed-up plots and any other figures to convey your story--- note that tracing may be difficult for UPC.

The goal of this assignment is to learn a PGAS language and parallelize a scientific code that's actually used in practice. In addition, there are two vectors for improving performance here: one is implementation, and the other is using math/changing the algorithm.

Resources / Notes

FAQ

How do I compile OSKI on Franklin?
Easy(ish):

  1. Copy this file over src/timer/cycle.h.
  2. use the --disable-shared option in configure i.e. "./configure --disable-shared". I also recommend --disable-bench to save on compilation time.
  3. edit libtool and comment out line 171 by adding a "#" to the beginning. It should look like
    #export_dynamic_flag_spec="\${wl}--export-dynamic"
Remember, Franklin does not support dynamic libraries, so you must statically compile your code. If you are unable to get your code working on Franklin, it is okay to use Bassi, but I'd really like people to work with (and around) Franklin.
Original project by David Bindel. Last updated March 8, 2008.