APPLICATION INTRODUCTION (Turbulent Combustion DNS)

From a theoretical perspective, DNS (Direct Numerical Simulation) is the most satisfactory approach to turbulence simulation. The equations governing turbulence are solved without recourse to explicit modeling at any scale of motion [2]. Unfortunately, for a simulation to resolve the finest scales of motion, both in time and space, the grid spacing and time step become so small, that resonably scaled geometries in actual flow conditions, contains both an untractable number of grid nodes, and too little time observation. Moreover, turbulent combustion DNS aggravate the computational load, adding the complexity of multple reactions. This is why, even the most dearing DNS tasks address some small, and industrially irrelevant reacting flows. In this perspective, DNS of entire turbine engines, or other combustion devices appears far away.

Nevertheless, the importance of DNS lies in the results produced even on small geometries, which can be used effectively to build models. In order to speed up the computations the recurse to parallel computing is an obvious step.

Note that parallelism is used, as in any CFD code, to distribute the grid-nodes to more than one processor, hence dividing the physical domain in areas, each of which is simulated on a different machine. Obvioulsy, every computing node exchanges information at the physical boudaries grid points.

 

THE PARALLEL COMPUTING PROJECT (from [3])

The Combustion Research Facility (CRF) at Sandia National Labs (SNL) developed over the years serial versions of 2D and 3D Compressible Flow DNS codes. Such codes are effectively used in basic turbulent combustion research. Researchers from Pittsburgh Supercomputing Center (PSC) worked on developing the parallel versions of the two codes. This project, on which plenty of details are available, offers a unique opportunity to observe parallel computation solving many speedup problems in turbulent combustion DNS serial codes.

The main parallelization task involved the following steps:

  • Restructuring and clean up. The serial codes are restructured and cleaned up to be ready for easier parallelization. In particular, great care is spent in designing correct input/output routines, and in avoiding machine dependance in the constructs.
  • Domain decomposition. As in any other CFD parallel code, the domain that is being simulated (in DNS usually a square, 2D, or a cube, 3D) is decomposed over a cartesian grid of processors. Each processor (or PE, processing element) is responsible of a subset of the whole grid points, and performs repetitive operations on this subset.
  • Interdomain exchange. Ghost cells data-types along the boundary are setup to achieve exchange. Hence, intercommunication between the nodes is necessary to exchange values of such ghost cells.
  • Message passing. Routines to achieve intercommunication are added to the code. MPI is used as language of choice, in order to achieve high portability of the code, which was one of the developers' goals.
 

TARGETED PLATFORMS

The targeted platforms are:

  • Cray T3E-900
  • SGI Origin 2000
  • IBM-SP2

The Cray T3E platform (jaromir) was targeted because of its availability at PSC, while SGI Origin 2000 (bert) is the platform available to CRF research personnel at SNL. Finally IBM-SP2 platform resides at PNNL (Pacific Northwest National Laboratory) [3]. Below a brief description of the Cray T3E-900 platform, for which the most detailed information was found.

The Cray T3E-900 at PSC (taken from [4] and [5])

The Cray T3E is a scalable shared-memory multiprocessor system based on the DEC Alpha 21164 family microprocessor. Here below we specify the main characteristics of the machine at PSC.

  • It has 512 high-performance PE, each containing a CPU, memory, and a communication engine.
  • The T3E's topology is that of a three-dimensional torus.
  • Peak performance is 461 Gflop/s.
  • CPU: digital Alpha 64-bit microprocessors, each running at 450 MHz (liquid cooled). The 450 Mhz PE's have a theoretical peak speed of 900 Mflop/s each.
  • Each processor runs a CHORUS-based microkernel.
  • The memory is physically distributed, with each PE having 128 MB.
  • Cray T3E uses Unicos/mk operating system. Unicos/mk expands Unicos capabilities in parallel efficiency and scalability.
  • The T3E supports PVM3 and MPI for message passing, and a Cray proprietary one-sided communication library, the so-called shmem library, which is implemented close to the hardware and shows very low latency of only 1.6 µs.
 

TOP 500 POSITION for T3E-900 at PSC (from [5])

It enters the list between June and November 1998.

  • It’s 17th on the November 1999 list.
    Reported peak is 486 Gflop/s, and maximal LINPACK achieved is 341.20 Gflop/s
  • It’s 28th in November 1999.
 

SCALABILITY AND PERFORMANCE OF CODE (from [3])

Scalability

Scalabilty results are shown below for two test cases on a 500 x 500 grid and on a 1000 x 1000 grid. See table and graph below for details. Note that on bigger grid (more points), scalability is better. Moreover, at the 500 PE level, the achieved speed up is almost 75% of the theoretical, being this a good result achieved.

 

 

 

As a side-note on performance, a brief email interview with one of the developers allowed to gain further information on the performance of the code. The developer at PSC recalls that the code ran, for the 1000 x 1000 grid on 500 processors at roughly 80 Mflop/s, hence totaling 40 Gflop/s, which is slightly more than 10% of the maximal LINPACK achieved.

 

OBTAINED RESULTS (from [3] and [6])

Two simulations run on PCS’ T3E-900 platform produced a paper by Jackie Chen (SNL), in which further knowledge into the burning modes (premixed vs. diffusion burning) in auto igniting non-homogeneous mixtures of hydrogen in heated air. Important data are obtained on the temporal evolution.

  • The first case study deals with the interaction of a pair of asymmetric (unequal strength) hydrogen-air premixed flames in a turbulent flow-field.
  • The second case study deals with the interaction of a pair of symmetric (equal strength) hydrogen-air premixed flames in a turbulent flow-field.

Such fundamental data helped clarify a much debated area of turbulent combustion.

Click on the pictures below to see the movies of case 1 and case 2 (dimensions in mm). Taken from [3].

   
Case 1 - Interaction of a strong flame
with a weak flame.
Case 2 - Interaction of two equal
strength flames.

 

 

COMMENTS

Pros

  • Extreme care devoted to portability. It was realized by the developers at PSC that the customer (CRF at SNL) would later on employ the code on other older machines. Hence the choice of MPI as intercommunication language.
  • Good scalability of the more detailed problem (1000 x 1000 grid).
  • Clear illustration of the steps to be taken by the developers to upgrade the serial code to a parallel code. Such practice is bound to become predominant in the near future.

Cons

  • Explicit choice of static load balancing, with the domain being divided at the time of compiling.

    Such decision was taken to minimize development time, but it might become a serius obstacle to scalability. In fact, contrary to nonreactive flows, turbulent combustion requires extra computation to track the species chemical behavior.

    Since this reactive behavior is unknown at the start of the job (for example, flame position in the domain is generally not known a priori), some PEs might have to perform more calculations to resolve the reactions accurring in their subset of cells, while other PEs might be idle, as no reaction is ongoing in their part of domani.

    A complete combustion paralle code, should account for dynamic load balancing.
 

REFERENCES

[1] Hinze, J. O., "Turbulence: an introduction to its mechanism and theory", New York, McGraw-Hill, 1959.
[2] Peters, N., "Turbulent Combustion", Cambridge, Cambridge University Press, 2000.
[3] http://pmw.org/~ravi/work/sandia/
[4] http://www.psc.edu/machines/cray/t3e/t3e.html
[5] http://www.top500.org
[6] Echekki, Tarek. Chen, Jacqueline H., "High-temperature combustion in autoigniting non-homogeneous hydrogen/air mixtures", Proceedings of the Combustion Institute. v 29 n 2 2002. p 2061-2068.