Haoyuan Li

I'm a Computer Science Ph.D. candidate in the AMPLab at UC Berkeley, advised by Prof. Scott Shenker and Prof. Ion Stoica. I co-created and lead Tachyon, an open source reliable memory centric distributed storage system. I'm also the founder and CEO of Tachyon Nexus.

You can reach me at haoyuan@tachyonnexus.com , or [Github] [LinkedIn] [Twitter] [Weibo]


Tachyon: A memory-centric storage system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. The project is open source and is deployed at multiple companies. It has more than 120 contributors from over 50 institutions, including Yahoo, Intel, Baidu, IBM, and Redhat etc. [SOCC 13] [Github] [Meetup]

Spark Streaming: Spark Streaming offers a high-level functional programming API, strong consistency, and efficient fault recovery. It is now part of the Spark, which lets users seamlessly intermix streaming, batch and interactive queries. [HotCloud'12] [SOSP'13] [Github]

Apache Spark: A cluster computing engine that makes data analytics fast. It provides an efficient abstraction for distributed in-memory computation. I am a founding committer of Apache Spark. [Github]

Parallel Frequent Pattern Mining: Various algorithms have been developed to speed up frequent itemset mining performance. We designed a parallel FP-Growth algorithm, and ran it on a cluster of several thousands of machines. It became a part of Apache Mahout. [RecSys'08]

Tachyon, Spark Streaming, Apache Spark, Shark, and Apache Mesos are parts of the Berkeley Data Analytics Stack (BDAS).


Selected Awards

Olin Fellowship, IBM Fellowship (twice), Morgan Stanley Fellowship, Beijing Outstanding Graduates, Chinese National Fellowship, Innovation Award at Peking University, Pacemaker to Outstanding students at Peking University (three times), General Electric Fellowship, No. 11 and No. 13 in ACM-ICPC World Final 2005 and 2006, No. 8 in Google Code Jam China Final,

Template design by Andreas Viklund. Valid XHTML and CSS.