Starting Fall 2011, I am joining the faculty of Computer & Information Science & Engineering (CISE) at the University of Florida (home of the Gators).
Check out my Gator Profile!

Electricial Engineering and Computer Sciences
University of California, Berkeley
RADlab 465 Soda Hall, #1776
Berkeley, CA 94720-1776
Homepage: http://www.cs.berkeley.edu/~daisyw/

I am a final-year Ph.D. student in the department EECS at UC Berkeley. My advisors are: Michael J. Franklin, Joseph M. Hellerstein and Minos Garofalakis. My primary research interest is databases and systems for large-scale probabilistic data management and analysis, using statistical machine learning. In my thesis work, I proposed, built and evaluated BayesStore, a probabilistic database system that natively supports graphical models and their inference algorithms. I studied information extraction as the driving application.
Resume My Blog News Talks Publications Projects Other
- [NEW!] Feb/16/11 My Hybrid-Inference paper is accepted to SIGMOD’2011! See you all in Athens, Greece!
- [NEW!] Feb/6/11 The Viterbi and MCMC inference implementations are included in MAD Library! GO MAD!
- [NEW!] Nov/17/10 I gave a CSAIL talk at MIT on “Querying Probabilistic Information Extraction”.
- [NEW!] Oct/29/10 My paper with the Almaden folks – “Selectivity Estimation for Extraction Operators over Text Data” is accepted to ICDE2011. Woohoo!
- Aug/12/10 I will be presenting paper “Querying Probabilistic Information Extraction” pvldb10 in VLDB 2010 Singapore. Come to my talk!
- Mar/02/10 I am at ICDE 2010 Long Beach. I am giving a talk – Probabilistic Declarative Information Extraction – see you there!
- Jan/13/10 I am attending the Berkeley RAD lab retreat at Lake Tahoe. It was very exciting meeting with folks from many leading IT companies. I gave a poster that made it to the runner-up! radlab-2010sp-retreat-poster
- Jan/05/10 I am visiting University of Toronto, and giving a talk at the Database Seminar.
- PastNews
- [NEW!] ICDE11, April 2011, Selectivity Estimation for Extraction Operators over Text Data icde11slides
- [NEW!] MIT, CSAIL Seminar, November 2010, Querying Probabilistic Information Extraction mit10slides
- [NEW!] VLDB10, September 2010, Querying Probabilistic Information Extraction pvldb10slides
- ICDE10, March 2010, Probabilistic Declarative Information Extraction icde10slides
- University of Toronto, DBSeminar, Jan 2010, BayesStore: Querying Probabilistic Information Extraction UofT10slides
- WebDB, June 2009, Functional Dependency Generation and Applications in Pay-As-You-Go Data Integration Systems webdb09slides
- Berkeley Machine Learning Tea, 8th May 2009, BayesStore: Supporting Statistical Models in Probabilistic Databases
- Stanford Info Lunch, 1st May 2009, Declarative Information Extraction in a Probabilistic Database System stanford09slides
- VLDB08, August 2008, BayesStore: Managing Large, Uncertain Data Repositories with Probabilistic Graphical Models vldb08slides
- Berkeley Database Seminar, 2006, Probabilistic Complex Event Triggering (PCET)
Hybrid In-Database Inference for Declarative Information Extraction sigmod11
To Appear, Proceedings of SIGMOD, 2011
Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis, Joseph M. Hellerstein, and Michael L. Wick
Selectivity Estimation for Extraction Operators over Text Data icde11 icde11slides
To Appear, Proceedings of ICDE, 2011
Daisy Zhe Wang, Long Wei, Yunyao Li, Frederick Reiss, and Shivakumar Vaithyanathan
Querying Probabilistic Information Extraction pvldb10 pvldb10slides
Proceedings of VLDB, 2010, PVLDB Vol.3
Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis, and Joseph M. Hellerstein
Probabilistic Declarative Information Extraction icde10 icde10slides TR-pdb-ie
Proceedings of ICDE, 2010, short paper
Daisy Zhe Wang, Eirinaios Michelakis, Michael J. Franklin, Minos Garofalakis, and Joseph M. Hellerstein
Functional Dependency Generation and Applications in Pay-as-you-go Data Integration Systems webdb09 webdb09slides TR-probFDgen
Proceedings of SIGMOD WebDB, 2009
Daisy Zhe Wang, Luna Dong, Anish Das Sarma, Michael J. Franklin, and Alon Halevy
BayesStore: Managing Large, Uncertain Data Repositories with Probabilistic Graphical Models vldb08a vldb08slides
Proceedings of VLDB, 2008
Daisy Zhe Wang, Eirinaios Michelakis, Minos Garofalakis, and Joseph M. Hellerstein
WebTables: Exploring the Power of Tables on the Web vldb08b
Proceedings of VLDB, 2008
Michael Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang
BayesStore: The BayesStore system is designed and built to support data analysis using graphical models, enable ad-hoc queries over the uncertainties and probabilities inherent in the data and analysis results. The fundamental ideas underlying BayesStore include: (1) creating a novel data model that treats uncertain relational data and graphical models of uncertainty as first-class objects; (2) implementing inference as a native operator in a query execution engine; (3) developing algorithms for relational operators over probabilistic models and data; and (4) devising query execution strategies that optimize across inference and relational operators. I used information extraction (IE) as the driving application for BayesStore.
WebTables: As a member of the WebTables project at Google, I worked on the complex problem of scalable extraction of the metadata of the HTML tables from the entire Web. I developed statistical classifiers and rule-based detectors, which recovered the schemas of millions of HTML tables. This vast number opened up a whole new data-driven way of thinking about schemas. In addition, I developed and evaluated algorithms based on Bayes’ Theorem, which statistically derive probabilistic functional dependencies from the extracted schemas.
SystemT: I collaborated with researchers in IBM Almaden Research to work on building an optimizer for SystemT, a rule-based information extraction system using AQL, a declarative SQL-like language. I developed estimators for the cost and the output size of text extractors, such as dictionary and regular expression. I further developed different document synopses for more accurate estimation of various statistics over text corpora. Experimental results demonstrated the accuracy of the estimators and the benefits of the optimizer.
Probabilistic Complex Event Triggering (PCET): In my work on PCET, I built an infrastructure that automatically infers and reasons about the probabilities of triggered events using a principled probabilistic model (i.e., Bayes Nets) along with the underlying noisy sensor data. I demonstrated that PCET simplifies the development process and, by using appropriate probabilistic models, boosts the accuracy of complex event-triggering systems, which deal with inherently uncertain and correlated data streams.
Bonsai: In collaboration with David Purdy, a Ph.D. student from the statistics department, I explored the alternative of using interactive visualization as a means to cultivate the statistical modeling process over large datasets. I built Bonsai, an interactive visualization tool, and demonstrated how such a tool with different types of visualizations over the data can help in building better decision trees.
Long Wei (undergrad, fall 2009)
- Guided Long through the implementation and evaluation of various document synopses for text databases.
Michael Zhang (M.S. student, fall 2009)
- Supervised Michael through the implementation of a BayesStore demonstration.
Dwight Crow (undergrad, summer 2009)
- Guided Dwight through the feasibility study of clustering millions of HTML table schemas into domains.
Open Source Projects
PG-ML (with Milenko Petrovic): a PostgreSQL wrapper for statistical machine learning libraries
Resources
Berkeley EECS LaTex Templates and Guides
A Parable of Modern Research
Bob has lost his keys in a room which is dark except for one brightly lit corner.
“Why are you looking under the light, you lost them in the dark!”
“I can only see here.”