CS 267 Applications of Parallel Computers

This page contains my homework assignments for CS 267 Applications of Parallel Computers (Spring 2011).

Contact

Name: Sara Alspaugh

Email: alspaugh (at) cs (dot) berkeley (dot) edu

Assignment 0: Describe a Parallel Application

Bio

I am a second year PhD student in the Computer Science Division of the EECS Department at UC Berkeley. I am advised by Randy Katz. I am a member of the LoCal research group, which focuses on applying computer science abstractions and methodologies to addressing challenges in evolving the electric power grid infrastructure. My research interests, broadly speaking, include distributed systems and networking with applications to the electric grid, and more specifically the design of new storage system architectures and the development of mechanisms for integrating renewable electricity sources into the grid. I am taking this course primarily to fulfill my systems+theory breadth requirement but also to gain experience writing parallel code.

Astronomical Mosaics with Montage

Montage is an astronomy application for creating science-grade mosaics of the sky from input images [4]. Creating the mosaics involves reprojecting the input images and then background rectifying and coadding to create the final output. An example directed acyclic graph of Montage workflow execution is shown in the figure below.


Figure 1: A directed acyclic graph of Montage workflow execution.

Montage was written in ANSI-compliant C. It is provided as a toolkit. It runs on the command line in Linux and Unix platforms and is highly parallelizable.

Mosaic is extremely data-intensive, though the tasks executed have a short runtime, on the order of a few minutes. In [3], the authors explore whether it is cost-effective to run scientific computing applications such as Mosaic on the cloud.

Montage in the Cloud

As mentioned previously, the authors in [3] simulate running Mosaic on the Amazon EC2 compute cloud with Amazon S3 storage [1] using a computational grid execution simulator called GridSim [2]. The Amazon EC2 cluster ranks as number 231 on the Top500 list of supercomputers.

A major weakness of this approach is that the authors do not compare their simulation results with real results. However, their simulation results show that it is indeed more cost-effective to run Mosaic on the cloud than to pay for dedicated resources on scientific HPC grids, at no cost to performance. The authors are able to get the execution time of Mosaic down to 1080 seconds on 128 processors on EC2 for a mosaic of one square degree. For comparison, the web site for Mosaic states that on a 2.3 GHz Linux processor with 1 GB memory, a mosaic of one square degree of sky can be built in ~5600 seconds. I do not provide cost numbers here because the cost is dependent on several factors, so there are many numbers which characterize the main result.

What I found most noteworthy about this application is that it argues for the cost-effectiveness of running traditional, data-intensive but short-running scientific computing applications on the cloud.

References

[1] Amazon Web Services. http://aws.amazon.com

[2] R. Buyya and M. Murshed. "GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing". Concurrency and Computation: Practice and Experience. Vol. 14, pp. 1175-1220. 2002.

[3] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. "The cost of doing science on the cloud: the Montage example." In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC '08). IEEE Press, Piscataway, NJ, USA, , article 50 , 12 pages. 2008.

[4] Montage: An Astronomical Image Mosaic Engine. http://montage.ipac.caltech.edu