My name is Jimmy Su.  I am a second year grad student in the computer science department.  My research interest is in compiler optimizations for parallel languages.  More specifically, I have been looking at optimizations for irregular array accesses in the context of the Titanium project.  My goal for taking CS267 is to learn and develop applications that would benefit from my work on compiler optimizations for irregular problems.

 

 

 

Distributed Immersed Boundary Simulation in Titanium

 

Problem

 

The immersed boundary method is a general technique for modeling elastic boundaries immersed within a viscous, incompressible fluid.  The method has been applied to several biological and engineering systems, including large scale models of the heart [3] and cochlea [1].  Simulation for these problems has been done before on share memory machines.  Porting these algorithms to a distributed memory machine would increase the number of available processors, so a larger problem can be worked on.  These simulations have the potential to improve the basic understanding of the biological systems they model and aid in the development of surgical treatments and prosthetic devices.

 

Challenge

 

Despite the popularity of the immersed boundary method and the desire to scale the problems to accurately capture the details of the physical systems, parallelization for large scale distributed memory machine has proven challenging.  The primary reason is a classic locality and load balance tradeoff that arises in distributing the immersed boundary data structure across processors. 

 

Distributed Immersed Boundary Simulation in Titanium

 

Givelberg and Yelick developed a parallelized algorithm for the immersed boundary method that is designed for scalability on distributed memory multiprocessors and clusters of SMPs [2].  The software package is implemented using the Titanium language [4], a Java-based high performance scientific computing.  The software package is called IB.  It takes advantage of the object-oriented features of Titanium to provide a framework for simulating immersed boundaries that separates the generic immersed boundary method code from the specific application features that define the immersed boundary structure and the forces that arise from those structures.  Results showed that IB is scalable, and the large scale immersed boundary computations with the IB package is feasible.

 

Platform

 

Experiments were carried out on Seaborg, an IBM SP RS/6000 at the National Energy Research Scientific Computing Center (NERSC).  This computer ranks 9th on the Top 500 list.  This is a distributed memory computer possessing a large number of 16-processor nodes.  Currently there are 380 nodes on this computer and each node has between 16 and 64 GBytes of memory.  

 

Performance

All of the tests were carried out on either 1, 2, 4 or 8 nodes, with the total number of processors used being 16, 32, 64 or 128.  Table 1 summarizes the wall clock per time step results for a number of test models, as well as the total number of floating point operations computed (in billions) when the maximal number of processors is employed.

 

 

Each processor on Seaborg has a peak performance of 1.5 GFlops.  The experiment shows that the software package runs at less than 3% of peak for all the different configurations.  This is the biggest weakness of this application.  Although the performance has much room for improvement, the transition from share memory machine to distribute memory machine has already pay off.  It is now able to work on problem sizes that it couldn’t do before due to the increase in the number of processors. 

 

References

 

[1] R. P. Beyer. A computational model of the cochlea using the immersed boundary method. J. Comp. Phys., 98:145–162, 1992.

[2] E. Givelberg and K. Yelick. Distributed immersed boundary simulation in Titanium. Submitted.

[3] D. M. McQueen and C. S. Peskin. Shared-memory parallel vector implementation of the immersed boundary method for the computation of blood flow in the beating mammalian heart. Supercomputing, 11:213–236, 1997.

[4] K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency: Practice and Experience, 10(11-13), September-November 1998.