I'm a sixth year graduate student working in the AMP Lab at UC Berkeley. I plan to graduate in the Fall of 2012 and will be doing a post-doc with the F1 team at Google starting in January 2013. My research interests broadly include large scale distributed storage systems and cloud computing. More specifically, my thesis focuses on scale independence, an alternative to traditional cost-based optimization that guarantees predictable performance for applications querying ever-growing datasets. I am advised by Armando Fox, Michael Franklin and David Patterson.

Research Projects

PiQL: Scale Independent Relational Query Processing

Collaborators: Kristal Curtis, Tim Kraska, Nick Lanham, Stephen Tu, Armando Fox, Michael Franklin, David Patterson

Rapidly growing data volumes have led many developers to abandon traditional relational databases in favor of distributed key/values stores and map/reduce programs. While these alternatives often provide trivial scalability, they lack many of the benefits of high-level declarative languages such as optimization and data-independence. Instead, we propose extending the the relational model with scale independence, a new type of data independence, that ensures consistent performance for all queries in an application, independent of the data size. Our implementation, PIQL, provides a scale independent relational system on top of existing distributed key/values stores by changing the objective function for optimization and automatically selecting and maintaining required indexes and materialized views. The PIQL system also integrates with the Scala compiler to provide language integrated schema specification and a LINQ-like query language.

Thesis

Publications

Talks

The source code is available in the piql subproject of the SCADS repository on Github


SCADS: Scalable Consistency Adjustable Data Storage

Collaborators: Peter Bodík, Tim Kraska, Nick Lanham, Gene Pang, Beth Trushkowsky, Stephen Tu, Armando Fox, Michael Franklin, David Patterson

SCADS is a research prototype key/value store written in Scala. Built using BDB-JE, its design is focused on modularity and easy deployment for running experiments. The system has served as the storage system for the director (FAST'11), PIQL execution engine, RAD Lab Stack, and the multi-datacenter concurrency control project.

Publications

The source code is available in the SCADS repository on Github


RAD Lab Stack

Collaborators: Allen Chen, Kristal Curtis, Amber Feng, Karl He, Rean Griffith Andy Konwinski, Justin Ma, Sunil Pedapudi, Ari Rabkin, Beth Trushkowsky, Matei Zaharia

Before the AMP Lab, I was a member of the Reliable Adaptive Distributed System Lab. The lab's moon-shot vision statement was to enable a single person to design, analyze, deploy and operate the next multi-million user website in only a single weekend. I led the effort to integrate the various projects of the lab, including SCADS, PIQL, Mesos, the director, Spark, and deploylib into a single unified demo stack. At the at the end of project celebration on February 24th 2011 we demonstrated three web applications written by undergrads, including one completed the previous weekend. Using the stack we scaled them to 300+ EC2 instances over the course of an afternoon.

Videos

The source code is available in the demo branch of the SCADS repository on Github


Other Projects

Deploylib

Deploylib is a scala DSL for deploying experiments and other software on clusters of machines, including Amazon's EC2. It was used to run the experiments for the PIQL and director papers as well as the RAD Lab Final Demo. It provides developers with the following constructs:

The source code is available in the deploylib subproject of the SCADS repository on Github. Documentation is available on the wiki.