I am a Professor of Computer Science and Chair of the CS Division at UC Berkeley, specializing in large-scale data management infrastructure and applications (these days called "Big Data"). I work primarily in the Database (DB) and Operating Systems and Networking Technology (OSNT) areas. I am Director of the Algorithms, Machines and People Lab (AMPLab) - an industry and government-supported collaboration of students, postdocs, and faculty who specialize in data management, cloud computing, statistical machine learning and other important topics necessary for making sense of vast amounts of varied and unruly data. The AMPLab received a National Science Foundation CISE "Expeditions in Computing" award, which was announced as part of the White House Big Data Research initiative in March 2012.

A brief bio and photo for talk announcements and other PR can be found here.

Research Topics

  • Cloud-Computing/Distributed Systems
  • Mobile and Pervasive Computing
  • Data Streams/Continuous Analytics
  • Large-Scale Data Integration
  • Database System Architecture and Performance

Contact Information

Computer Science Division, EECS
465 Soda Hall #1776
Berkeley, CA 94720

Email: my address
Phone: (510) 642-1662 (voice mail only)
Fax: (510) 642-5615
Administrative Assistants:
  Kattt Atchley and Boban Zarkovich
  Phone: (510) 643-3499 and 643-0264
  Email: amp-admin@cs.berkeley.edu

Grant Asst: Damon Hinson
  Phone: (510) 642-9982
  Email: AA address
Office Hours (spring 2014):
  Tu 3:30-4:30, Th 2:30-3:30, 449 Soda; or by appt

Research Projects


  • AMPLab - Algorithms, Machines & People
  • BDAS - The Berkeley Data Analytics Stack
  • MLbase - Distributed Machine Learning for the Masses
  • CrowdDB - Crowdsourced Query Processing
  • Spark - Data-Intensive Cluster Computing
  • Shark - SQL + Machine Learning at Scale
  • GraphX - Scalable Graph Processing


  • PIQL - The Performance-Insightful Query Language
  • SCADS - Scalable Distributed Storage
  • Dataspaces - Pay-as-you-go Data Integration
  • BayesStore - Probabilistic Databases
  • RADLab - Cloud Computing
  • HiFi - Distributed Stream Processing
  • TelegraphCQ - Stream Processing
  • TinyDB - Sensor Networks
  • YFilter - High-volume Data Dissemination

Related Links