Adaptive Mesh Refinement

 

The Algorithm

The adaptive mesh refinement (AMR) algorithm is used to numerically solve partial differential equations. It uses a hierarchy of rectangular grids, with finer grids nested inside coarser ones. The union of all grids of a particular refinement composes a level, which may not cover the entire domain but is completely contained within the next coarser level. The algorithm adaptively creates and destroys finer grids depending on where in the domain extra resolution is needed, according to user-defined error estimation criteria. An example two-level grid hierarchy is below, with think lines representing coarse grids and thin lines fine grids.

 

Applications

AMR can be used in a variety of applications, such as flame simulation Navier-Stokes flows, and Euler flows. In particular, the Berkeley Astrophysical Fluid Dynamics Group uses AMR to study high-mass star formation. Such stars start burning while still accreting mass, bombarding the accreting material with radiation pressure that exceeds the gravitational force on the material. It is unknown how a high-mass star can sustain a high accretion rate despite the radiation pressure. Numerical simulations require enormous amounts of computational resources when using a fine static grid. AMR allows simulations to achieve the accuracy of fine grids without all the computational requirements, since it only creates fine grids where needed.

Parallelization

The Berkeley Lab implementation of the AMR algorithm is parallelized by distributing grids to different processors. In the example above, each coarse grid, represented by a rectangle with thick boundaries, could potentially be given to a different processor. A key requirement of such a parallelization is to assure that the work is efficiently distributed among each processor. A load-balancing strategy using an approximate knapsack algorithm is used to distribute the finest grids across processors, since such grids require the most computation. The figure below shows the average inefficiency of this algorithm for both randomly and non-randomly-sized grids, using N grids and K processors. The algorithm works better as the number of grids per processor increases.

Platform

The Berkeley Lab AMR software package runs on multiple platforms. Performance numbers provided are from a Compaq AlphaServer SC45 at the Goddard Space Flight Center. This is a distributed memory parallel machine with 104 nodes, four processors per node, and a peak performance of 2.8 TFLOP/s. It ranks 32nd out of the top 500 machines.

 

The package is written in C++ and FORTRAN, using specialized libraries to pass data between C++ and FORTRAN routines. The FORTRAN routines use a single program, multiple data (SPMD) approach, where the data is distributed between processors, each of which execute the same code. The C++ pieces of the program handle memory management and program flow, while the FORTRAN routines perform the numerically intensive parts of the algorithm.

 

The package contains two libraries of interest. The BoxLib library solves finite difference equations on unions of disjoint rectangles of the same coarseness. The AMRLib library uses BoxLib to implement the AMR aspects of the program. The package also uses MPI to implement communication based on message passing.

Performance

MFLOP/s numbers for the AlphaServer could not be found, though speedups over a single processor are available.

 

Problem size

Vort Tagging Factor

N Points Updated

N Procs

AMR Run time (sec)

Rate (points/sec-proc)

Unscaled speedup

32x32x48

0.0050

35864576

1

4833

7421

1

32x32x48

0.0050

35864576

4

1360

6593

3.6

32x32x48

0.0050

35864576

16

571

3926

8.5

32x32x48

0.0050

35864576

32

403

2781

12.0

64x64x96

0.0025

190840832

1

39638

4815

1

64x64x96

0.0025

190840832

16

3019

3951

13.1

64x64x96

0.0025

190840832

32

2699

2210

14.7

64x64x96

0.0025

190840832

64

1988

1500

26.2

 

Sub-linear scaling can be attributed to multiple issues. As mentioned previously, the load-balancing algorithm becomes more inefficient as the number of processors per grid increases. The algorithm also requires communication between processors at grid boundaries, so the cost of communication increases as the number of processors increases.

Conclusion

AMR allows much larger problem sizes to be solved than static meshes, since only parts of the domain need to be resolved finely. However, work still needs to be done in improving the load-balancing algorithm and making communication more efficient in order to achieve closer to linear speedups.

Links/Sources

Berkeley AMR

Berkeley Astrophysical Fluid Dynamics Group

Parallelization of Structured, Hierarchical Adaptive Mesh Refinement Algorithms

Incompressible Navier-Stokes Baseline Performance Measurement

PYRAMID Parallel Unstructured Adaptive Mesh Refinement