Haoyuan Li

I'm a Computer Science Ph.D. candidate in the AMPLab at UC Berkeley, focusing on computer systems, big data, and cloud computing, advised by Prof. Scott Shenker and Prof. Ion Stoica. I co-created and lead Tachyon, an open source reliable memory centric storage system for big data analytics. I'm also a founding committer of Apache Spark and a co-creator of Spark Streaming. Before Berkeley, I studied at Cornell University and Peking University, and worked at Conviva and Google.

You can reach me at haoyuan@cs.berkeley.edu , or [Github] [Twitter] [LinkedIn] [Weibo]

Projects

I focus on systems and algorithms for large-scale data-intensive computing. Below is a list of open source projects that I work on:

Tachyon: A memory-centric storage system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. The project is open source and is deployed at multiple companies. It has more than 50 contributors from over 20 institutions, including Yahoo, Intel, Redhat, Pivotal etc.[SOCC 13] [Github] [Meetup]

Spark Streaming: Spark Streaming offers a high-level functional programming API, strong consistency, and efficient fault recovery. It is now part of the Spark, which lets users seamlessly intermix streaming, batch and interactive queries. [HotCloud'12] [SOSP'13] [Github]

Apache Spark: A cluster computing engine that makes data analytics fast. It provides an efficient abstraction for distributed in-memory computation. I am a founding committer of Apache Spark. [Github]

Shark: A high-speed query engine runs Hive SQL queries on top of Spark, and supports fault recovery and complex analytics (e.g. machine learning). I contributed to the integration with Tachyon. [Github]

Parallel Frequent Pattern Mining: Various algorithms have been developed to speed up frequent itemset mining performance. We designed a parallel FP-Growth algorithm, and ran it on a cluster of several thousands of machine. It became a part of Apache Mahout. [RecSys'08]

Apache Mesos and Apache Yarn: Both Mesos and Yarn are cluster resource managers. I ported Yarn to run on top of Mesos.

Tachyon, Spark Streaming, Apache Spark, Shark, and Apache Mesos are parts of the Berkeley Data Analytics Stack (BDAS).

Publications

Talks

Selected Awards

Olin Fellowship, IBM Fellowship (twice), Morgan Stanley Fellowship, Beijing Outstanding Graduates, Chinese National Fellowship, Innovation Award at Peking University, Pacemaker to Outstanding students at Peking University (three times), General Electric Fellowship, No. 11 and No. 13 in ACM-ICPC World Final 2005 and 2006, No. 8 in Google Code Jam China Final,

Template design by Andreas Viklund. Valid XHTML and CSS. Password Manager: OneLastPass.