Matei Zaharia

I'm a Ph.D. student in the UC Berkeley AMP Lab, interested in computer systems, networks, and cloud computing. My advisors are Scott Shenker and Ion Stoica. I'm supported by a Google Ph.D. fellowship.

Before joining Berkeley, I worked with Srinivasan Keshav at the University of Waterloo.

You can contact me at matei@berkeley.edu.

Projects

I focus on systems and algorithms for large-scale data-intensive computing. My projects include:

Spark: As big data analytics evolves beyond simple batch jobs, there is a need for both more complex multi-stage applications (e.g. machine learning algorithms) and more interactive ad-hoc queries. Spark provides an efficient abstraction for in-memory cluster computing called Resilient Distributed Datasets, and can run 30x faster than Hadoop for these applications. (homepage) (short paper) (NSDI'12 paper)

Mesos: Clusters are running increasingly diverse applications, from batch jobs to interactive services. Mesos is a cluster manager that efficiently supports diverse applications by letting them control their own scheduling. The project is open source in the Apache Incubator. (homepage) (NSDI'11 paper)

Multi-Resource Fairness: Life is not fair, but with a little help, your computer system can be, ensuring predictable time-sharing between users. However, past work on fair sharing considered a single resource (e.g. CPU), while cluster applications have demands across multiple resources (memory, IO, CPU, etc). Dominant resource fairness generalizes max-min fairness for this case. (NSDI'11 paper)

MapReduce Scheduling: I've worked on several scheduling algorithms for MapReduce, including the LATE algorithm for straggler mitigation (OSDI'08) and delay scheduling for data locality (Eurosys'10). Both algorithms are now included in Hadoop. I also developed the Hadoop Fair Scheduler.

SNAP Sequence Aligner: I'm working with colleagues from Microsoft and UCSF on SNAP, a sequence alignment algorithm that is 10-100x faster than current tools and simultaneously more accurate, to handle the growing volume of data from high-throughput DNA sequencers. (arXiv paper)

Publications

2012

2011

2010

Earlier

Full Publication List and Technical Reports

Talks

Open Source

Almost all of my work is open source:

I'm also a committer on the Apache Hadoop and Mesos projects.

Other Activities

Starting in high school, I've participated in a number of programming contests, including the International Olympiad in Informatics and the ACM International Collegiate Programming Contest. I've now stopped doing contests, but I still love algorithmic and mathematical problems.

In undergrad, I contributed to the open source realtime strategy game 0 A.D., where I worked on gameplay logic, random map generation, water rendering, and multiplayer networking.

I enjoy reading, nature, and food that is either good or free.

Template design by Andreas Viklund. Valid XHTML and CSS.