The adaptive mesh refinement (AMR) algorithm is used to numerically solve partial differential equations. It uses a hierarchy of rectangular grids, with finer grids nested inside coarser ones. The union of all grids of a particular refinement composes a level, which may not cover the entire domain but is completely contained within the next coarser level. The algorithm adaptively creates and destroys finer grids depending on where in the domain extra resolution is needed, according to user-defined error estimation criteria. An example two-level grid hierarchy is below, with think lines representing coarse grids and thin lines fine grids.

AMR can be used in a variety of applications, such as flame simulation Navier-Stokes flows, and Euler flows. In particular, the Berkeley Astrophysical Fluid Dynamics Group uses AMR to study high-mass star formation. Such stars start burning while still accreting mass, bombarding the accreting material with radiation pressure that exceeds the gravitational force on the material. It is unknown how a high-mass star can sustain a high accretion rate despite the radiation pressure. Numerical simulations require enormous amounts of computational resources when using a fine static grid. AMR allows simulations to achieve the accuracy of fine grids without all the computational requirements, since it only creates fine grids where needed.
The Berkeley Lab implementation of the AMR algorithm is parallelized by distributing grids to different processors. In the example above, each coarse grid, represented by a rectangle with thick boundaries, could potentially be given to a different processor. A key requirement of such a parallelization is to assure that the work is efficiently distributed among each processor. A load-balancing strategy using an approximate knapsack algorithm is used to distribute the finest grids across processors, since such grids require the most computation. The figure below shows the average inefficiency of this algorithm for both randomly and non-randomly-sized grids, using N grids and K processors. The algorithm works better as the number of grids per processor increases.

The
Berkeley Lab AMR software package runs on multiple platforms. Performance
numbers provided are from a Compaq AlphaServer SC45
at the Goddard Space Flight Center. This is a distributed memory parallel
machine with 104 nodes, four processors per node, and a peak performance of 2.8
TFLOP/s. It ranks 32nd out of the top 500 machines.
The package is written in C++ and FORTRAN, using
specialized libraries to pass data between C++ and FORTRAN routines. The
FORTRAN routines use a single program, multiple data (SPMD) approach, where the
data is distributed between processors, each of which execute the same code.
The C++ pieces of the program handle memory management and program flow, while
the FORTRAN routines perform the numerically intensive parts of the algorithm.
The package contains two libraries of interest. The BoxLib library solves finite difference equations on unions of disjoint rectangles of the same coarseness. The AMRLib library uses BoxLib to implement the AMR aspects of the program. The package also uses MPI to implement communication based on message passing.
MFLOP/s numbers for the AlphaServer could not be found, though speedups over a single processor are available.
|
Problem size |
Vort Tagging Factor |
N Points Updated |
|
AMR Run time (sec) |
Rate (points/sec-proc) |
Unscaled speedup |
|
32x32x48 |
0.0050 |
35864576 |
1 |
4833 |
7421 |
1 |
|
32x32x48 |
0.0050 |
35864576 |
4 |
1360 |
6593 |
3.6 |
|
32x32x48 |
0.0050 |
35864576 |
16 |
571 |
3926 |
8.5 |
|
32x32x48 |
0.0050 |
35864576 |
32 |
403 |
2781 |
12.0 |
|
64x64x96 |
0.0025 |
190840832 |
1 |
39638 |
4815 |
1 |
|
64x64x96 |
0.0025 |
190840832 |
16 |
3019 |
3951 |
13.1 |
|
64x64x96 |
0.0025 |
190840832 |
32 |
2699 |
2210 |
14.7 |
|
64x64x96 |
0.0025 |
190840832 |
64 |
1988 |
1500 |
26.2 |
Sub-linear scaling can be attributed to multiple issues. As mentioned previously, the load-balancing algorithm becomes more inefficient as the number of processors per grid increases. The algorithm also requires communication between processors at grid boundaries, so the cost of communication increases as the number of processors increases.
AMR allows much larger problem sizes to be solved than
static meshes, since only parts of the domain need to be resolved finely.
However, work still needs to be done in improving the load-balancing algorithm
and making communication more efficient in order to achieve closer to linear
speedups.
Berkeley Astrophysical Fluid Dynamics Group
Parallelization of Structured, Hierarchical Adaptive Mesh Refinement Algorithms
Incompressible Navier-Stokes Baseline Performance Measurement