Haoyuan Li

I'm a Computer Science Ph.D. candidate in the AMPLab at UC Berkeley, advised by Prof. Scott Shenker and Prof. Ion Stoica. I co-created and lead Alluxio (formerly Tachyon), an open source memory speed virtual distributed storage system. I'm also the founder and CEO of Alluxio.

You can reach me at haoyuan@alluxio.com , or [Github] [LinkedIn] [Twitter] [Weibo]

Projects

Alluxio (formerly Tachyon): A memory speed virtual distributed storage system. The project is open source and is deployed at multiple companies. It has more than 200 contributors from over 50 institutions, including Alibaba, Yahoo, Intel, Baidu, IBM, and Redhat etc. [SOCC 13] [Github] [Meetup]

Spark Streaming: Spark Streaming offers a high-level functional programming API, strong consistency, and efficient fault recovery. It is now part of the Spark, which lets users seamlessly intermix streaming, batch and interactive queries. [HotCloud'12] [SOSP'13] [Github]

Apache Spark: A cluster computing engine that makes data analytics fast. It provides an efficient abstraction for distributed in-memory computation. I am a founding committer of Apache Spark. [Github]

Parallel Frequent Pattern Mining: Various algorithms have been developed to speed up frequent itemset mining performance. We designed a parallel FP-Growth algorithm, and ran it on a cluster of several thousands of machines. It became a part of Apache Mahout. [RecSys'08]

Alluxio (formerly Tachyon), Spark Streaming, Apache Spark, Shark, and Apache Mesos are parts of the Berkeley Data Analytics Stack (BDAS).

Publications

Selected Awards

Olin Fellowship, IBM Fellowship (twice), Morgan Stanley Fellowship, Beijing Outstanding Graduates, Chinese National Fellowship, Innovation Award at Peking University, Pacemaker to Outstanding students at Peking University (three times), General Electric Fellowship, No. 11 and No. 13 in ACM-ICPC World Final 2005 and 2006, No. 8 in Google Code Jam China Final,

Template design by Andreas Viklund. Valid XHTML and CSS.