My Portrait

Michael B. Driscoll

I'm a graduate student in the Computer Science Division at UC Berkeley. I'm advised by Kathy Yelick and Armando Fox.

A Communication-Optimal N-Body Algorithm for Direct Interactions

Michael Driscoll, Evangelos Georganas, Penporn Koanantakool, Edgar Solomonik, and Katherine Yelick

Abstract: We consider the problem of communication avoidance in computing interactions between a set of particles in scenarios with and without a cutoff radius for interaction. Our strategy, which we show to be optimal in communication, divides the work in the iteration space rather than simply dividing the particles over processors, so more than one processor may be responsible for computing updates to a single particle. Similar to a force decomposition in molecular dynamics, this approach requires up to √p times more memory than a particle decomposition, but reduces communication costs by factors up to √p and is often faster in practice than a particle decomposition. We examine a generalized force decomposition algorithm that tolerates the memory limited case, i.e. when memory can only hold c copies of the particles for c = 1,2,...,√p. When c = 1, the algorithm degenerates into a particle decomposition; similarly when c = √p, the algorithm uses a force decomposition. We present a proof that the algorithm is communication-optimal and reduces critical path latency and bandwidth costs by factors of c2 and c, respectively. Performance results from experiments on up to 24K cores of Cray XE-6 and IBM BlueGene/P machines indicate that the algorithm reduces communication in practice. In some cases, it even outperforms the original force decomposition approach because the right choice of c strikes a balance between the costs of collective and point-to-point communication. Finally, we extend the analysis to include a cutoff radius for direct evaluation of force interactions. We show that with a cutoff, communication optimality still holds. We sketch a generalized algorithm for multi-dimensional space and assess its performance for 1D and 2D simulations on the same systems.

PDF coming May 2013.

PyGAS: A Partitioned Global Address Space Extension for Python

Michael Driscoll, Amir Kamil, Yili Zheng, Shoaib Kamil, and Katherine Yelick

Abstract: High-level, productivity-oriented languages such as Python are becoming increasingly popular in HPC applications as “glue” and prototyping code. The PGAS model offers its own productivity advantages [6], and combining PGAS and Python is a promising approach for rapid development of parallel applications. We discuss the potential benefits and challenges of a PGAS extension to Python, and we present initial performance results from a prototype implementation called PyGAS.

Available via: PDF

CS 267: Applications of Parallel Computers, Spring 2013

Fall 2011 CS262A: Advanced Topics in Computer Systems Prof. E. Brewer Report (PDF)
CS294: Vectorizing Compilers Dr. R. Allen
Spring 2012 CS267: Applications of Parallel ComputersProf. J. Demmel Paper (PDF)
CS270: Combinatorial Algorithms and Data StructuresProf. S. Rao
Fall 2012 CS284: Computer-Aided Geometric DesignProf. C. Séquin
CS294: Program SynthesisProf. R. Bodik Report (PDF)
Spring 2013 CS283: Graduate Computer GraphicsProf. R. Ramamoorthi
Fall 2013 CS294: Modern Parallel LanguagesProf. K. Yelick
Fall 2014 Ma221: Numerical Linear AlgebraProf. J. Demmel
CNM190: Advanced Digital AnimationProf. D. Garcia
Spring 2015 CNM190: Advanced Digital AnimationProf. D. Garcia Short Film

PyGAS: A Partitioned Global Address Space Extension to Python

PyGAS is a Python extension module that provides a shared-memory abstract for distributed, SPMD programs. The module represents the core functionality upon which higher-level libraries can be implemented, i.e. distributed array packages, graph abstractions, etc. More information can be found in our short paper. PyGAS is currently under active development and APIs may change rapidly.

OpenSubdiv with SpMV Kernels

In a forthcoming paper, we cast subdivision surface evaluation as sparse matrix-vector multiplication (SpMV). We use Pixar's OpenSubdiv as an implementation vehicle and implement our kernels using high-performance linear algebra libraries.

Logical Indexing Library

Complex memory layouts like Z-Morton curves aid performance but hinder productivity. To bridge the gap, the Logical Indexing Library reads layout descriptors and generates expressions that map from a simple, 2D space into the complex layout.