Matei Zaharia

New: I'm looking for an academic job! Here are my application materials.

I'm a PhD student in the UC Berkeley AMP Lab, interested in computer systems, networks, and cloud computing. My advisors are Scott Shenker and Ion Stoica. I'm supported by a Google PhD fellowship.

Before joining Berkeley, I worked with Srinivasan Keshav at the University of Waterloo.

You can contact me at matei@berkeley.edu.

Projects

I focus on systems and algorithms for large-scale data-intensive computing. My projects include:

Spark: As big data analytics evolves beyond simple batch jobs, there is a need for both more complex multi-stage applications (e.g. machine learning algorithms) and more interactive ad-hoc queries. Spark provides an efficient abstraction for in-memory cluster computing called Resilient Distributed Datasets, and can run 100x faster than Hadoop for these applications. (homepage) (short paper) (NSDI'12 paper)

Shark: This high-speed query engine runs Hive SQL queries on top of Spark up to 100x faster than Hive, and supports fault recovery and complex analytics (e.g. machine learning). (homepage) (tech report)

Mesos: Clusters are running increasingly diverse applications, from batch jobs to interactive services. Mesos is a cluster manager that efficiently supports diverse applications by letting them control their own scheduling. The project is open source in the Apache Incubator. (homepage) (NSDI'11 paper)

Multi-Resource Fairness: Life is not fair, but with a little help, your computer system can be, ensuring predictable time-sharing between users. However, past work on fair sharing considered a single resource (e.g. CPU), while cluster applications have demands across multiple resources (memory, IO, CPU, etc). Dominant resource fairness generalizes max-min fairness for this case. (NSDI'11) (SIGCOMM'12)

MapReduce Scheduling: I've worked on several scheduling algorithms for MapReduce, including the LATE algorithm for straggler mitigation (OSDI'08) and delay scheduling for data locality (Eurosys'10). Both algorithms are now included in Hadoop. I also developed the Hadoop Fair Scheduler.

SNAP Sequence Aligner: To tackle the growing volume of genomic data, SNAP is a new sequence alignment algorithm that is 10-100x faster than current tools and also more accurate. (homepage) (arXiv)

Publications

2013

2012

2011

2010

Earlier

Full Publication List and Technical Reports

Talks

Open Source

Almost all of my work is open source:

I'm also a committer on the Apache Hadoop and Mesos projects.

Other Activities

Starting in high school, I've participated in a number of programming contests, including the International Olympiad in Informatics and the ACM International Collegiate Programming Contest. I've now stopped doing contests, but I still love algorithmic and mathematical problems.

In undergrad, I contributed to the open source realtime strategy game 0 A.D., where I worked on gameplay logic, random map generation, water rendering, and multiplayer networking.

I enjoy reading, nature, and food that is either good or free.

Template design by Andreas Viklund. Valid XHTML and CSS.