Vasily Volkov
CS 267 HW 0 Spring 2006

About Me

I am a first year Ph.D. student in EECS department with background in applied mathematics and physics. Currently, I am interested in graphics and computational methods. In the past I have been doing finite-element cloth simulation, adaptive meshing, visualization of very large terrain data sets and computing shortest paths on smooth surfaces.

I am interested in parallel computing because it allows solving very large computational problems. Nowadays, parallel computers are gradually becoming a commodity on the market of graphics hardware and skills of exploiting parallelism are turning to be nearly essential.

Classical Molecular Dynamics on Parallel Computers

Quantitative improvement in computational power over decades allows achieving today qualitatively new results, which hardly could be imagined with the first computers. One of such examples is simulation of macroscopic phenomena in microscopic scale using molecular dynamics. Recently, it became possible to simulate billions of atoms on large supercomputers, such as Los AlamosÕ QSC [Kadau et al. 2004], ASCI White [Abraham et al. 2002], and Cray T3E [Roth et al. 2000]. Such number of atoms corresponds to the micrometer scale and can be used for studying the properties of materials — fracture, hardening or phase transitions. This "computational microscope", provides spatial-temporal resolution unattainable by experimental measurements or studies of continuum mechanics.

In order to simulate so high number of interacting particles, models of classical molecular dynamics are used, thus neglecting quantum effects, which would require substantially higher computational cost.

In classical molecular dynamics, atoms or molecules are treated as interacting point masses, whose motion is integrated using second NewtonÕs law. The interaction is described with empirical models, such as Lennard-Jones 6-12 potential [Lomdahl et al. 1993] are employed. The interaction forces may be either short-range or long-range, and there are a variety of methods to compute them efficiently. An example of long-range forces is Coulomb forces, when particles may be ionized having non-zero electric charge. They are more expensive to compute and often left beyond the scope of simulation.

Short-range forces, such as described by Lennard-Jones potential, can be computed efficiently in O(N) time, where N is number of particles. In order to do that, a cutoff distance is introduced. Particles, which are further apart than the cutoff distance, are considered as non-interacting.

Computational domain is then split into cells of size slightly larger than the cutoff distance. Then, each particle may interact only with the particles residing either in the same cell or one of the adjacent cells. Lomhahl et al. [1993] report of from 65 to 520 particles per cell, depending on the cutoff distance.

Parallelization of this scheme can be achieved using spatial subdivision, which is rather standard technique in parallel programming. The domain is subdivided into rectangular boxes of the cells, each assigned to different processor. It is schematically shown in Figure 1. In this case, each cell and all particles within belong to one processor, so the computational cost is scaled as O(N/P), which is optimal. In order to handle interaction across processor boundaries, message passing is used. The amount of the resulting communication is proportional then to the surface of the internal boundaries and scales as surface-to-volume ratio O((N/P)2/3). Additional communication is needed to redistribute particles between processor boundaries after they are assigned new positions.


Figure 1. The computational domain is split into cells of the size slightly above the interaction cutoff distance (thin lines), and these cells are distributed over the processors (bold lines). Reproduced from [Kadau et al. 2004]

The ratio between communication and computation in this approach scales as O((P/N)1/3). It means that the relative communication overhead should be expected to be smaller when number of particles per node is large. This expectation is supported by observations made by researchers [Roth et al. 2000, Lahmdale et al. 1993, Kadau et al. 2004]. In practice, parallel efficiencies over 90% have been achieved for very large number of particles. Figures 2 and 3 illustrate this behavior.


Figure 2. Benchmark of molecular dynamic simulation for 32K atoms. For example, if 512 processors participate in simulation, each of them hosts only 64 particles. So marginally small number of particles per processors shifts the bottleneck to the communication. This and the next figure is reproduced from the benchmark page of Steve Plimpton: http://www.cs.sandia.gov/~sjplimp/lammps/bench.html.


Figure 3. Benchmark of molecular dynamic simulation for 32K atoms per processor. Computational cost dominates over communication cost, resulting in high parallel efficiency.

One of the most successful implementations of the molecular dynamics was SPaSM, implemented in ANSI C using MPI [Kadau et al. 2004]. In 1993 it performed at 50.7Gflops on Connection Machine-5, which was 40% of the theoretical peak performance of the computer [Lomdahl et al. 1993]. Later, SPaSM code was used to simulate 19 billion particle molecular dynamics on Los Alamos' QSC computer with 256 computational nodes, each consisting of four Alpha EV6 processors running at 1.25 GHz and has 16 GByte RAM.

References

[Kadau et al. 2004] Kadau, K., Germann, T. C., Lomdahl, P. S. Large-Scale Molecular Dynamics Simulation of 19 Billion Particles. International Journal of Modern Physics C, 15(1): 193-201, 2004.

[Lomdahl et al. 1993] Lomdahl, P. S., Tamayo, P., Gronbech-Jensen, N., Beazley, D. M. 50 GFlops molecular dynamics on the Connection Machine 5, In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, 520-527, 1993.

[Roth et al. 2000] Roth, J. Gaehler, F., and Trebin, H.-R. A molecular dynamics run with 5.180.116.000 particles, International Journal of Modern Physics C, 11(2): 317-322, 2000.

[Abraham et al. 2002] Abraham, F. F., Walkup, R., Gao, H., Duchaineau, M., De La Rubia, T. D., and Seager, M. Simulating materials failure by using up to one billion atoms and the world's fastest computer: Brittle fracture, in Proceedings of National Academy of Sciences, 99(9), 5777-5782, 2002.