Parallel Algebraic Multigrid

Hormozd Gahvari
CS 267 Homework 0

I'm a first-year Ph.D. student in Computer Science. My research interests are in numerical methods and scientific computing, and I work in the Berkeley Benchmark and Optimization (BeBOP) group. These interests lead me to parallel computing, which I'll need to have a good understanding of, as lots of scientific computing work these days is done on parallel computers. Hence, my enrollment in CS 267.

Introduction: Basic Multigrid

Many scientific simulations require the numerical solution of large systems of partial differential equations. The main step in solving these equations is to discretize them, which breaks the problem down into the solution of a number of large, sparse linear systems. Multigrid methods are very effective as either efficient solvers for these systems or as preconditioners for other iterative solvers.

The key component of multigrid is the concept of coarse grid correction. More details can be found in [1]; the following is just a "bare bones" explanation based on it. The motivation is that the convergence of iterative solvers slows after a number of iterations. The error (difference between the numerical solution and actual solution) at each point after the iterations tends to be smooth in nature. The progress of basic iterative solution methods is slowed by this. However, coarsening the system being solved (removing some of the points at which we're trying to obtain a solution) makes the error more oscillatory. This restores the effectiveness of the basic iterative solvers, which are used to solve the system Ae = r, where r is the residual from the fine grid mapped to the coarse grid in a process called restriction, for a correction e that, after being interpolated back to the original finer grid of solution points, is applied to the solution on that grid. This process can be repeated.

From this building block, a series of grids can be defined, on which a multigrid method is then used to solve a system. Iterative solvers with certain properties that make them known as relaxation schemes (the reason for this is beyond the scope of this assignment) are applied are at each grid, and the result is moved between grids using coarse grid correction and interpolation, whichever is applicable. A number of different strategies are possible for applying a multigrid method in this fashion; two of the most popular are the V-cycle and full multigrid, which are diagrammed below:

Multigrid was a major development. It freed the convergence rate of an iterative solver from being dependent on the problem size, enabling the efficient solution of large systems. But not all problems could benefit initially. Multigrid as described above worked on structured grids, such as a square grid of regularly spaced points used to discretize a problem domain for solving a partial differential equation using a finite difference method. Not all problem domains are nicely shaped (imagine a "blob" with a wobbly boundary and a few holes in it), and unless one is willing to do a lot of extra work, laying a highly regular grid through such a domain is not a good idea (it may not be possible at all, as the problem being solved could very easily not be defined outside of the domain of solution). Some more work was thus necessary to extend multigrid to such a situation.

Algebraic Multigrid

The idea behind algebraic multigrid is to extend the concept of multigrid to a situation like the one described above, in which a problem domain cannot readily be discretized using a regularly structured grid [1]. This results in the need for additional steps before using a multigrid cycling scheme like in the case of "classic" (termed geometric) multigrid, which are [3]
  1. Select a relaxation scheme.
  2. Define the coarse grids.
  3. Define the intergrid transfer operators (restriction and interpolation).
In geometric multigrid, steps (2) and (3) were already provided from the structure of the grid on which the problem is being solved, leaving the selection of a relaxation scheme as the only precursor to starting a multigrid cycling scheme. The addition of these steps will show up again shortly, when we look at how to parallelize algebraic multigrid.

Parallelization

The parallelization of algebraic multigrid breaks down into the two parts: parallelizing steps (2) and (3) of the setup for algebraic multigrid and parallelizing the multigrid cycling scheme. The step of selecting a relaxation scheme is taken care of in advance, by selecting a scheme that is easily parallelized. This takes care of parallelizing the multigrid cycling scheme, as separate parts of the problem are assigned to different processors which then apply the relaxation scheme to their part of the problem [2]. Parallelizing the other part, however, is a lot more complicated. The task is described as serial in nature [4], and this makes sense, because finding the best coarse grid representation of an unstructured fine grid in the general case requires global information.

From here on, the discussion will focus on an implementation of parallel algebraic multigrid called BoomerAMG that was developed at Lawrence Livermore National Laboratory [4]. BoomerAMG is an MPI algebraic multigrid code for distributed memory parallel machines [4]. When collecting their research data, the code's authors ran it on the Blue Pacific parallel processor at Livermore [4], which comes in at number 33 and number 364 (there are two different Blue Pacific machines at Livermore, and the authors don't specify which one they used) on the most recent top 500 supercomputer list. For a relaxation scheme, BoomerAMG uses Gauss-Sidel iteration on the parts of the problem domain that are within the subdomain assigned to a specific processor and Jacobi iteration on the parts that are on the boundary between subdomains [2], which is very easy to parallelize. As for parallelizing the process of defining the coarse grids and intergrid transfer operations, BoomerAMG can use a number of recently developed algorithms that use heuristics to make up for not using global information (the problem domain is partitioned among the processors, which then figure out the appropriate coarse grid points in their particular subdomains), the best one of which depends on the structure (or lack thereof) of the grid [4]. The convergence results are good overall, but the scalability is not ideal [2,4]. This is not surprising -- while the scalability of the multigrid cycling is good (it's an inherently parallel task that's been researched extensively), the scalability of the coarse grid formation ranges from poor to fair [2], which makes sense because of the inherent serialness of the ideal way to form the coarse grids. It's no surprise, then, that the authors identify the coarse grid formation algorithms as the main area for further research [4].

References

[1] Briggs, W.L., Henson, V.E., McCormick, S.F. A Multigrid Tutorial. 2nd ed. SIAM, 2000.

[2] Falgout, R.D., Henson, V.E., Jones, J.E., Yang, U.M. "BoomerANG: A Parallel Implementation of Algebraic Multigrid." Presented at the 9th SIAM Conference on Parallel Processing, San Antonio, TX, March 22, 1999. [link].

[3] Henson, V.E. "An Algebraic Multigrid Tutorial." Presented at the Ninth Copper Mountain Conference on Multigrid Methods, Copper Mountain, CO, April 10, 1999. [link].

[4] Henson, V.E., Yang, U.M. "BoomerAMG: a Parallel Algebraic Multigrid Solver and Preconditioner." Lawrence Livermore National Laboratory technical report UCRL-JC-141495, 2001. [link].