The National Center for Atmospheric Research (NCAR) Community Climate Model has gone through several versions and is now a stable, efficient, and documented atmospheric general circulation model designed for climate research on supercomputers or high-end workstations. CCM enables scientists to conduct experiments in global modeling without having to develop a complete global climate model of this complexity. It has been used by scientific institutions around the world for research in areas like CO2 warming and climate change, climate prediction and predictability, atmospheric chemistry, paleoclimate, biosphere-atmosphere transfer and nuclear winter.
More Information: The NCAR Community Climate Model (CCM3) or Studies
CCM/MP-2D (originally named PCCM2.1) is a parallel implementation of version 2.1 of CCM developed in the early to mid 1990s. It uses a two-dimensional domain decomposition approach that allows up to 1024 processors to be used for T42L18 (a grid of a specific resolution). CCM/MP-2D was originally designed for the Intel Paragon with 1024 processors and the IBM SP2 with 128 processors. But the code can easily be ported to other multiprocessors that support message-passing paradigms, or to run on machines distributed across a network.
More Information: History of CCM/MP-2D
The two-dimensional domain decomposition of CCM/MP-2D causes the longitude and latitude dimensions to be decomposed, resulting in longitude-latitude patches. However, the vertical dimension remains undecomposed. Two patches are assigned to each processor, one from the northern hemisphere and its reflection across the equator in the southern hemisphere. This allows symmetry to be exploited in the Legendre transform (a mathematical function). The decomposition naturally defines a virtual two-dimensional processor grid, with rows representing common latitude assignments and columns representing common longitude assignments. This decomposition allows the physics computations to be independent between processors, so no interprocessor communication is usually needed.
There are some non-trivial problems with this solution. For one, much of the physics is related to solar radiation, thereby resulting in a significant load imbalance between night and day grid points. In order to remedy this, each processor swaps half of its grid points with the processor in the same row holding grid points that are 180 degrees away, and then swapping them back when the physics computations are complete.
Another problem is in using the semi-Lagrangian algorithm on the physical grid. For each grid point, a trajectory is calculated back in time, to determine what grid cell to use in interpolating the current values. This calculation is independent between grid points, but the data needed to calculate the trajectories and to interpolate the fields might not be local to the processor holding the grid point. To fix this, the algorithm fills halo regions of sufficient thickness around each patch so that, once filled, all needed information is local to each processor. Usually, this only requires communication between the nearest neighbors in the logical processor grid. However, near the poles the halo region for a patch must include the entire polar cap. This requires communication between all processors assigned patches near the pole, resulting in a load imbalance in the cost of filling the halo regions between the polar and equatorial processors.
More Information: Parallel Implementation
While the serial complexity of a run of CCM/MP-2D is affected by the evolving solution, much of the cost of a day of simulation is fixed from day to day. The complexity was measured for two different problem sizes:
The results were as follows:
| grid | timestep | steps per day | floating point operations per day | sqrt calls in flop count | fdiv calls in flop count | |
| T42L18 | 128 X 64 X 18 | 20 minutes | 72 | 59,554,603,237 | 8.3% | 4.2% |
| T170L18 | 512 X 256 X 18 | 5 minutes | 288 | 3,231,429,529,384 | 7.0% | 3.1% |
The high percentage of sqrt and fdiv calls make it more complicated to use these counts to compute meaningful flops/s. This indicates an interpretation problem that will occur on any platform for which sqrt and fdiv are significantly slower than a floating point multiply/add.
More Information: Serial Complexity of CCM/MP-2D