I'm a second year PhD student in EECS
at Berkeley in the Berkeley NLP Group. My advisor is Dan Klein. I'm primarily
interested in Natural Language Processing and Machine Learning, particularly
computational historical linguistics and unsupervised learning.
Previously, I was an undergrad in Symbolic Systems at Stanford
University, working in the Stanford
NLP Group with Dan
Jurafsky and Chris Manning. There I
worked on the Mimir Project, which aims to measure the effect of
funding and institutional structure on interdisciplinarity and
research. I also picked up an M.S. somewhere along the way.
Publications
Iterative Monotonically Bounded A* [bib][brief][pdf]
David Burkett, David Hall and Dan Klein
Association for the Advancement of Artificial Intelligence
@inproceedings{burkett11imba,
title = {Iterative Monotonically Bounded A*},
author = {David Burkett and David Hall and Dan Klein},
booktitle = {AAAI},
year = {2011}
}
Informed search algorithms such as A* use heuristics to focus
exploration on states with low total path cost. To the extent
that heuristics underestimate forward costs, a wider cost radius
of suboptimal states will be explored. For many weighted graphs,
however, a small distance in terms of cost may encompass a large
fraction of the unweighted graph. We present a new informed search
algorithm, Iterative Monotonically Bounded A* (IMBA*), which first
proves that no optimal paths exist in a bounded cut of the graph
before considering larger cuts. We prove that IMBA* has the same
optimality and completeness guarantees as A* and, in a
non-uniform pathfinding application, we empirically demonstrate
substantial speed improvements over classic A*.
Large-Scale Cognate Recovery [bib][brief][pdf]
David Hall and Dan Klein
Empirical Methods in Natural Language Processing
@inproceedings{hall11largescale,
title = {Large-Scale Cognate Recovery},
author = {David Hall and Dan Klein},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2011}
}
We present a system for the large scale induction of
cognate groups. Our model explains the evolution of cognates as a
sequence of mutations and innovations along a phylogeny. On the
task of identifying cognates from over 21,000 words in 218 different
languages from the Oceanic language family, our model achieves a
cluster purity score over 91%, while maintaining pairwise recall
over 62%.
Finding Cognate Groups Using
Phylogenies [bib][brief][pdf]
David Hall and Dan Klein
Association for Computational Linguistics, Uppsala 2010.
@inproceedings{hall10cognates,
title = {Finding Cognates using Phylogenies},
author = {David Hall and Dan Klein},
booktitle = {Association for Computational Linguistics (ACL)},
year = {2010}
}
We automate the
comparative method from historical linguistics by modeling the evolution
of words from a source language through their phylogenetic tree. Using this model, we can automatically
identify words which are cognate with one another. We also introduce mechanisms for making
inference over string-valued random variables more tractable.
Labeled LDA: A
supervised topic
model for credit attribution [pdf]
Daniel Ramage, David Hall, Ramesh Nallapati and Christopher D. Manning.
Empirical Methods in Natural Language Processing, Singapore 2009.
Studying the History of Ideas Using Topic Models [pdf]
David Hall, Dan Jurafsky, and Christopher D. Manning.
Empirical Methods in Natural Language Processing, Honolulu, 2008.
Learning Alignments and Leveraging Natural Logic [pdf]
Nathanael Chambers, Daniel Cer, Trond Grenager, David Hall, Chloe
Kiddon, Bill MacCartney, Marie-Catherine de Marneffe, Daniel Ramage,
Eric Yeh and Christopher D. Manning.
ACL Workshop on Textual Entailment and Paraphrase, Prague, 2007.
Undergraduate Thesis
Tracking the Evolution of Science[pdf] Honors Thesis. (Advisors: Dan
Jurafsky and Christopher Manning.) 2008.
Projects
ScalaNLP
A set of tools for doing NLP, Machine Learning, and whatever else
entertains me, in the lovely Scala programming language. Joint with Daniel Ramage.
Historical Linguistics
We're working on automatic discovery of cognate groups and the
reconstruction of ancient word forms. We're also looking into
reconstruction of semantics, morphology, and maybe even syntax.
Overmind
A project to build an agent for playing the game StarCraft. Our goals for this project
include the creation of a robust, scalable system that can emulate many different human-like styles of play at different skill levels.
Tolstoy at the Limits
A never-ending project (with Folahan
Olowoyeye) to derive a precise understanding of what is meant by
Tolstoy's calculus of history in War
and Peace. Current avenues of exploration include non-parametric
Bayesian statistics, measure theory, and elementary calculus.
CS107: Programming Paradigms
Autumn and Spring, 2007 and 2008. (TA) C Memory Management. Scheme.
Concurrency. Code Generation. This class is not what is used to be, for
better and for worse.
CS92SI: Explorations on OCaml
Spring, 2007. (Course Leader) Basics of the OCaml programming language.
Thinking in terms of modules and lambdas.
CS93SI: Modern C++ Techniques
Spring, 2006. (Course Leader) C++ Templates and Factories. Templates
and LISP. Exceptions. Templates and more templates.