The key component of multigrid is the concept of coarse grid correction. More details can be found in [1]; the following is just a "bare bones" explanation based on it. The motivation is that the convergence of iterative solvers slows after a number of iterations. The error (difference between the numerical solution and actual solution) at each point after the iterations tends to be smooth in nature. The progress of basic iterative solution methods is slowed by this. However, coarsening the system being solved (removing some of the points at which we're trying to obtain a solution) makes the error more oscillatory. This restores the effectiveness of the basic iterative solvers, which are used to solve the system Ae = r, where r is the residual from the fine grid mapped to the coarse grid in a process called restriction, for a correction e that, after being interpolated back to the original finer grid of solution points, is applied to the solution on that grid. This process can be repeated.
From this building block, a series of grids can be defined, on which a multigrid method is then used to solve a system. Iterative solvers with certain properties that make them known as relaxation schemes (the reason for this is beyond the scope of this assignment) are applied are at each grid, and the result is moved between grids using coarse grid correction and interpolation, whichever is applicable. A number of different strategies are possible for applying a multigrid method in this fashion; two of the most popular are the V-cycle and full multigrid, which are diagrammed below:

Multigrid was a major development. It freed the convergence rate of an iterative solver from being dependent on the problem size, enabling the efficient solution of large systems. But not all problems could benefit initially. Multigrid as described above worked on structured grids, such as a square grid of regularly spaced points used to discretize a problem domain for solving a partial differential equation using a finite difference method. Not all problem domains are nicely shaped (imagine a "blob" with a wobbly boundary and a few holes in it), and unless one is willing to do a lot of extra work, laying a highly regular grid through such a domain is not a good idea (it may not be possible at all, as the problem being solved could very easily not be defined outside of the domain of solution). Some more work was thus necessary to extend multigrid to such a situation.
From here on, the discussion will focus on an implementation of parallel algebraic multigrid called BoomerAMG that was developed at Lawrence Livermore National Laboratory [4]. BoomerAMG is an MPI algebraic multigrid code for distributed memory parallel machines [4]. When collecting their research data, the code's authors ran it on the Blue Pacific parallel processor at Livermore [4], which comes in at number 33 and number 364 (there are two different Blue Pacific machines at Livermore, and the authors don't specify which one they used) on the most recent top 500 supercomputer list. For a relaxation scheme, BoomerAMG uses Gauss-Sidel iteration on the parts of the problem domain that are within the subdomain assigned to a specific processor and Jacobi iteration on the parts that are on the boundary between subdomains [2], which is very easy to parallelize. As for parallelizing the process of defining the coarse grids and intergrid transfer operations, BoomerAMG can use a number of recently developed algorithms that use heuristics to make up for not using global information (the problem domain is partitioned among the processors, which then figure out the appropriate coarse grid points in their particular subdomains), the best one of which depends on the structure (or lack thereof) of the grid [4]. The convergence results are good overall, but the scalability is not ideal [2,4]. This is not surprising -- while the scalability of the multigrid cycling is good (it's an inherently parallel task that's been researched extensively), the scalability of the coarse grid formation ranges from poor to fair [2], which makes sense because of the inherent serialness of the ideal way to form the coarse grids. It's no surprise, then, that the authors identify the coarse grid formation algorithms as the main area for further research [4].
[2] Falgout, R.D., Henson, V.E., Jones, J.E., Yang, U.M. "BoomerANG: A Parallel Implementation of Algebraic Multigrid." Presented at the 9th SIAM Conference on Parallel Processing, San Antonio, TX, March 22, 1999. [link].
[3] Henson, V.E. "An Algebraic Multigrid Tutorial." Presented at the Ninth Copper Mountain Conference on Multigrid Methods, Copper Mountain, CO, April 10, 1999. [link].
[4] Henson, V.E., Yang, U.M. "BoomerAMG: a Parallel Algebraic Multigrid Solver and Preconditioner." Lawrence Livermore National Laboratory technical report UCRL-JC-141495, 2001. [link].