|
|
|
 |
| |
|
|
| |
| APPLICATION
INTRODUCTION (Turbulent Combustion DNS)
From a theoretical perspective, DNS (Direct Numerical Simulation)
is the most satisfactory approach to turbulence simulation. The
equations governing turbulence are solved without recourse to explicit
modeling at any scale of motion [2]. Unfortunately, for a simulation
to resolve the finest scales of motion, both in time and space,
the grid spacing and time step become so small, that resonably scaled
geometries in actual flow conditions, contains both an untractable
number of grid nodes, and too little time observation. Moreover,
turbulent combustion DNS aggravate the computational load, adding
the complexity of multple reactions. This is why, even the most
dearing DNS tasks address some small, and industrially irrelevant
reacting flows. In this perspective, DNS of entire turbine engines,
or other combustion devices appears far away.
Nevertheless, the importance of DNS lies in the results produced
even on small geometries, which can be used effectively to build
models. In order to speed up the computations the recurse to parallel
computing is an obvious step.
Note that parallelism is used, as in any CFD code, to distribute
the grid-nodes to more than one processor, hence dividing the physical
domain in areas, each of which is simulated on a different machine.
Obvioulsy, every computing node exchanges information at the physical
boudaries grid points. |
| |
| THE
PARALLEL COMPUTING PROJECT (from [3])
The Combustion Research Facility (CRF)
at Sandia National Labs (SNL)
developed over the years serial versions of 2D and 3D Compressible
Flow DNS codes. Such codes are effectively used in basic turbulent
combustion research. Researchers from Pittsburgh Supercomputing
Center (PSC) worked
on developing the parallel versions of the two codes. This project,
on which plenty of details are available, offers a unique opportunity
to observe parallel computation solving many speedup problems in
turbulent combustion DNS serial codes.
The main parallelization task involved the following steps:
- Restructuring and clean up. The serial
codes are restructured and cleaned up to be ready for easier parallelization.
In particular, great care is spent in designing correct input/output
routines, and in avoiding machine dependance in the constructs.
- Domain decomposition. As in any other
CFD parallel code, the domain that is being simulated (in DNS
usually a square, 2D, or a cube, 3D) is decomposed over a cartesian
grid of processors. Each processor (or PE, processing element)
is responsible of a subset of the whole grid points, and performs
repetitive operations on this subset.
- Interdomain exchange. Ghost cells
data-types along the boundary are setup to achieve exchange. Hence,
intercommunication between the nodes is necessary to exchange
values of such ghost cells.
- Message passing. Routines to achieve
intercommunication are added to the code. MPI is used as language
of choice, in order to achieve high portability of the code, which
was one of the developers' goals.
|
| |
| TARGETED
PLATFORMS
The targeted platforms are:
- Cray T3E-900
- SGI Origin 2000
- IBM-SP2
The Cray T3E platform (jaromir) was targeted because of its availability
at PSC, while SGI
Origin 2000 (bert) is the platform available to CRF research personnel
at SNL. Finally
IBM-SP2 platform resides at PNNL
(Pacific Northwest National Laboratory) [3]. Below a brief description
of the Cray T3E-900 platform, for which the most detailed information
was found.
The Cray T3E-900 at PSC (taken from [4]
and [5])
The Cray T3E is a scalable shared-memory multiprocessor system
based on the DEC Alpha 21164 family microprocessor. Here below we
specify the main characteristics of the machine at PSC.
- It has 512 high-performance PE, each containing a CPU, memory,
and a communication engine.
- The T3E's topology is that of a three-dimensional torus.
- Peak performance is 461 Gflop/s.
- CPU: digital Alpha 64-bit microprocessors, each running at 450
MHz (liquid cooled). The 450 Mhz PE's have a theoretical peak
speed of 900 Mflop/s each.
- Each processor runs a CHORUS-based microkernel.
- The memory is physically distributed, with each PE having 128
MB.
- Cray T3E uses Unicos/mk operating system. Unicos/mk expands
Unicos capabilities in parallel efficiency and scalability.
- The T3E supports PVM3 and MPI for message passing, and a Cray
proprietary one-sided communication library, the so-called shmem
library, which is implemented close to the hardware and shows
very low latency of only 1.6 µs.
|
| |
| TOP
500 POSITION for T3E-900 at PSC (from [5])
It enters the list between June and November 1998.
- It’s 17th on the November 1999 list.
Reported peak is 486 Gflop/s, and maximal LINPACK achieved
is 341.20 Gflop/s
- It’s 28th in November 1999.
|
| |
| SCALABILITY
AND PERFORMANCE OF CODE (from [3])
Scalability
Scalabilty results are shown below for two test cases on a 500
x 500 grid and on a 1000 x 1000 grid. See table and graph below
for details. Note that on bigger grid (more points), scalability
is better. Moreover, at the 500 PE level, the achieved speed up
is almost 75% of the theoretical, being this a good result achieved.
As a side-note on performance, a brief email interview with one
of the developers allowed to gain further information on the performance
of the code. The developer at PSC recalls that the code ran, for
the 1000 x 1000 grid on 500 processors at roughly 80 Mflop/s, hence
totaling 40 Gflop/s, which is slightly more than 10% of the maximal
LINPACK achieved.
|
| |
| OBTAINED
RESULTS (from [3] and [6])
Two simulations run on PCS’ T3E-900 platform produced a paper
by Jackie Chen (SNL), in which further knowledge into the burning
modes (premixed vs. diffusion burning) in auto igniting non-homogeneous
mixtures of hydrogen in heated air. Important data are obtained
on the temporal evolution.
- The first case study deals with the interaction of a pair of
asymmetric (unequal strength) hydrogen-air premixed flames in
a turbulent flow-field.
- The second case study deals with the interaction of a pair of
symmetric (equal strength) hydrogen-air premixed flames in a turbulent
flow-field.
Such fundamental data helped clarify a much debated area of turbulent
combustion.
Click on the pictures below to see the movies of case 1 and case
2 (dimensions in mm). Taken from [3].
 |
 |
| |
|
Case 1 - Interaction of a strong flame
with a weak flame. |
Case 2 - Interaction of two equal
strength flames. |
|
| |
| COMMENTS
Pros
- Extreme care devoted to portability. It was realized by the
developers at PSC that the customer (CRF at SNL) would later on
employ the code on other older machines. Hence the choice of MPI
as intercommunication language.
- Good scalability of the more detailed problem (1000 x 1000 grid).
- Clear illustration of the steps to be taken by the developers
to upgrade the serial code to a parallel code. Such practice is
bound to become predominant in the near future.
Cons
- Explicit choice of static load balancing, with the domain being
divided at the time of compiling.
Such decision was taken to minimize development time, but it might
become a serius obstacle to scalability. In fact, contrary to
nonreactive flows, turbulent combustion requires extra computation
to track the species chemical behavior.
Since this reactive behavior is unknown at the start of the job
(for example, flame position in the domain is generally not known
a priori), some PEs might have to perform more calculations to
resolve the reactions accurring in their subset of cells, while
other PEs might be idle, as no reaction is ongoing in their part
of domani.
A complete combustion paralle code, should account for dynamic
load balancing.
|
| |
| REFERENCES
| [1] |
Hinze, J. O., "Turbulence: an introduction
to its mechanism and theory", New York, McGraw-Hill, 1959. |
| [2] |
Peters, N., "Turbulent Combustion", Cambridge, Cambridge
University Press, 2000. |
| [3] |
http://pmw.org/~ravi/work/sandia/ |
| [4] |
http://www.psc.edu/machines/cray/t3e/t3e.html |
| [5] |
http://www.top500.org |
| [6] |
Echekki, Tarek. Chen, Jacqueline H., "High-temperature
combustion in autoigniting non-homogeneous hydrogen/air mixtures",
Proceedings of the Combustion Institute. v 29 n 2 2002. p 2061-2068. |
|
|