CS267 Assignment 2: Parallelize Particle Simulation

Due Tuesday 1 March 2011 at 11:59pm

Overview

The purpose of this assignment is introduction to programming in shared and distributed memory models.

Your goal is to parallelize a toy particle simulator (similar particle simulators are used in mechanics, biology, astronomy, etc.) that reproduces the behavior shown in the following animation:

The range of interaction forces is limited as shown in grey for a selected particle. Density is set sufficiently low so that given n particles, only O(n) interactions are expected.

Suppose we have a code that runs in time T = O(n) on a single processor. Then we'd hope to run in time T/p when using p processors. We'd like you to write parallel codes that approach these expectations.

Source Code

You may start with the serial and parallel implementations supplied below. All of them run in O(n2) time, which is unacceptably inefficient.

serial.cpp
a serial implementation,
openmp.cpp
a shared memory parallel implementation done using OpenMP,
pthreads.cpp
a shared memory parallel implementation done using pthreads (if you prefer it over OpenMP),
mpi.cpp
a distributed memory parallel implementation done using MPI,
common.cpp, common.h
an implementation of common functionality, such as I/O, numerics and timing,
Makefile
a makefile that should work on all NERSC clusters if you uncomment appropriate lines,
job-franklin-serial, job-franklin-pthreads4, job-franklin-openmp4, job-franklin-mpi4,
job-hopper-serial, job-hopper-pthreads24, job-hopper-openmp24, job-hopper-mpi24
sample batch files to launch jobs on Franklin and Hopper. Use qsub to submit on Franklin or Hoppper.
particles.tar
all above files in one tarball.

You are welcome to use any NERSC cluster in this assignment. If you wish to build it on other systems, you might need a custom implementation of pthread barrier, such as: pthread_barrier.c, pthread_barrier.h.

You may consider using the following visualization program to check the correctness of the result produced by your code: Linux/Mac version (requires SDL), Windows version.

Submission

You may work in groups of 2 or 3. One person in your group should be a non-CS student, but otherwise you're responsible for finding a group. After you have chosen a group, please come to the GSI office hours to discuss the distribution of work among team members. Email the GSIs your report and source codes. Here is the list of items you might show in your report:

Resources

Part 2: GPU

Due Tuesday 8 March 2011 at 11:59pm

Overview

You will also be running this assignment on GPUs. You have access to Dirac, an experimental GPU cluster at NERSC. Each node has an NVIDIA Tesla C2050, as well as two quad-core CPUs (See the NERSC Dirac Webpage for more detailed information.)

Source Code

We will provide a naive O(n2) GPU implementation, similar to the openmp, pthreads, and MPI codes listed above. It will be your task to make the necessary algorithmic changes and machine optimizations to achieve favorable performance across a range of problem sizes.

Help

It may help to have a clean O(n) serial CPU implementation as a reference. If you feel this will help you, please e-mail the GSIs after Part 1 is due and we can provide this.

Submission

Please include a section in your report detailing your GPU implementation, as well as its performance over varying numbers of particles. Here is the list of items you might show in your report:

GPU Resources:


[ Back to CS267 Resource Page ]