CS267 Assignment 2: Parallelize Particle Simulation
Due Wednesday 4 March 2009 at 11:59pm
Overview
The purpose of this assignment is introduction to programming in shared and distributed memory models.
Your goal is to parallelize a toy particle simulator
(similar particle simulators are used in
mechanics,
biology,
astronomy, etc.)
that reproduces the behaviour
shown in the following animation:
The range of interaction forces is limited as shown in grey for a selected particle.
Density is set sufficiently low so that given n particles, only O(n) interactions are
expected.
Suppose we have a code that runs in time T = O(n) on a single processor.
Then we'd hope to run in time T/p when using p processors.
We'd like you to write parallel codes that approach these expectations.
Source Code
You may start with the serial and parallel implementations supplied below. All of them
run in O(n2) time, which is unacceptably inefficient.
|
- serial.cpp
- a serial implementation,
- openmp.cpp
- a shared memory parallel implementation done using OpenMP,
- pthreads.cpp
- a shared memory parallel implementation done using pthreads (if you prefer it over OpenMP),
- mpi.cpp
- a distributed memory parallel implementation done using MPI,
- common.cpp, common.h
- an implementation of common functionality, such as I/O, numerics and timing,
- Makefile
- a makefile that should work on all NERSC clusters if you uncomment appropriate lines,
-
job-franklin-serial,
job-franklin-pthreads4,
job-franklin-openmp4,
job-franklin-mpi4,
-
job-bassi-serial,
job-bassi-pthreads8,
job-bassi-openmp8,
job-bassi-mpi8
- sample batch files to launch jobs on Franklin and Bassi.
Use qsub to submit on Franklin and llsubmit
to submit on Bassi.
- particles.tar
- all above files in one tarball.
|
You are welcome to use any NERSC cluster in this assignment. If you wish to build it
on other systems, you might need a custom implementation of pthread barrier, such as:
pthread_barrier.c,
pthread_barrier.h.
You may consider using the following visualization program
to check the correctness of the result produced by your code: Linux/Mac version (requires
SDL), Windows version.
Submission
You may work in groups of 2 or 3. One person in your
group should be a non-CS student (if possible), but otherwise
you're responsible for finding a group.
Mail me (Vasily) your report and source codes.
Here is the list of items you might show in your report:
- A plot in log-log scale that shows that your serial and parallel codes run in O(n) time
and a description of the data structures that you used to achieve it.
- A description of the synchronization you used in the shared memory implementation.
- A description of the communication you used in the distributed memory implementation.
- A description of the design choices that you tried and how did they affect the performance.
- Speedup plots that show how closely your parallel codes approach the idealized p-times speedup
and a discussion on whether it is possible to do better.
- Where does the time go?
Consider breaking down the runtime into computation time, synchronization time and/or communication time.
How do they scale with p?
- A discussion on using pthreads, OpenMP and MPI.
Resources
[ Back to CS267 Resource Page ]