PETSc: Toolkit for Scientific Applications

The PETSc toolkit is meant as a library to aid application programmers in the scientific computing community in building large applications. The library itself supports a suite of parallel linear and nonlinear equation solvers and an interface that permits different coding styles and data structures. The toolkit supports three different types of solvers: those included in the package based on Newton and Krylov methods, hooks for using external solvers such as Matlab as well as an interface allowing users to extend the library with their own solvers.

PETSc promotes a fairly strict design methodology - although it is implemented in C, it makes significant use of abstraction and other popular features available in object-oriented languages. Furthermore, the interface can be called from C++ and Fortran. Moreover, it includes a fair amount of profiling information, allowing memory usage and other interesting performance characteristics to be obtained from a parallel application.

PETSc is implemented alongside the popular Message-Passing Interface, which allows the toolkit to benefit from the large spectrum of parallel machines that support MPI. Although the use of MPI is not transparent to the user, PETSc does not rely on extensive knowledge of the MPI interface. The task of explicitly managing message-passing code is left to PETSc, which instead promotes functions to assemble vectors and matrices and efficiently distribute them across a parallel job. The following vector and matrix forms can be assembled:

Example Application: PETSc-FUN3d

FUN3D is an unstructured mesh code developed at NASA, which solves the incompressible Euler and Navier-Stokes equations and is an example application that has been ported to the PETSc framework. FUN3D is used in design optimization of airplanes and automobiles characterised by irregular meshes composed of several million mesh points. In particular, the time to solution is an important metric, perhaps even more so that actual flop rate, since rapid convergence is a necessity in the design optimization. The PETSc-FUN3d effort was first presented at Supercomputing'1999 and was the recipient of the Gordon Bell Prize.

Serial Performance

In order to achieve a low time to solution, memory references must be minimized and obtaining excellent serial performance helps to establish a theoretical peak on parallel performance. As such, cache performance is improved by using the following approaches:

  1. Interlacing improves spatial locality
  2. Structural blocking improves register reuse and hence reduces the number of loads
  3. Reordering of vertices and edges improves temporal locality

The resulting performance improvements are shown below, for three different architectures. In each case, the optimizations contribute to a significant improvement:

Parallel Performance

As is the case with all major parallel applications that recognize the costs associated with accessing data on remote processors, the approach taken here is to use scatter/gather operations in order to partition global data and allow each processor to operate on a local copy. Where possible, the authors attempt to make use of split-phase operations, where communication time can be overlapped with some computation time.

In the parallel scalability results shown below (for the Cray T3E), the reported implementation efficiency measures the effect of increasing the number of iterations to convergence with an increasing number of processors. The resulting efficiency remains above 80% for the largest node configuration (1024 processors) - a tendency visible in the near-linear aggregate Gflop/s graph. Good scalability can be partly explained by the T3E's excellent network scalability (torus network), which can keep up with local processor performance.

In summary, the porting effort of FUN3d to PETSc has shown to be successful in terms of performance but also in terms of programmability and portability. A more thorough discussion, shown in [3], shows how the application scales on other architectures. As such, the PETSc architecture served as an excellent vehicle for allowing a complex application to produce excellent results over an array of configurations.

References

[1] PETSc homepage. http://www-unix.mcs.anl.gov/petsc/petsc-2/

[2] Baley, Buschelman et al. PETSc Users manual. ANL-95/11 Revision 2.1.5, Argonne National Laboratory, 2002.

[2] W.K. Anderson, W.D. Gropp et al. Achieving High Sustained Performance in an Unstructured Mesh CFD Application. IEEE Proceedings of Supercomputing'99, 1999.

[3] PETSc-FUN3D homepage. http://www-fp.mcs.anl.gov/petsc-fun3d/

-- ChristianBell - 28 Jan 2004