Charles Sutton
Department of Electrical Engineering and Computer Science
University of California, Berkeley
Email: casutton AT eecs.berkeley.edu
Publications
About Me
I am a postdoctoral researcher in the RAD Lab
at UC Berkeley, working on probabilistic techniques for monitoring, control, and diagnosis of complex distributed systems. I work with Michael I. Jordan.
Previously, my thesis work applied statistical machine learning techniques to
problems in natural language.
Many problems in NLP—such as
information extraction, speech processing, and machine
translation—require learning to predict a structured object with
complex probabilistic dependencies among its parts. In my thesis, my focus was on
statistical learning methods for structured prediction problems
that scale well to large data sets, in particular,
to efficient training of conditional random fields.
I investigated training algorithms that make use of approximate inference
techniques for graphical models, mainly variational approaches.
Recent Publications
My full list of publications is available. Or you might be interested in these recent highlights:
-
Unsupervised Deduplication using Cross-field Dependencies. Robert Hall, Charles Sutton, Andrew McCallum.
In Conference on Knowledge Discovery and Data Mining (KDD). 2008.
(Hierarchical DP model that jointly clusters citation venue strings based on both string-edit distance and title information.)
[ .pdf |
bib
| abstract
]
Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue strings and title strings are dependent---because venues tend to focus on a few research areas---but this dependence is not modeled by current unsupervised techniques. We call this dependence between fields in a record a cross-field dependence. In this paper, we present an unsupervised generative model for the deduplication problem that explicitly models cross-field dependence. Our model uses a single set of latent variables to control two disparate clustering models: a Dirichlet-multinomial model over titles, and a non-exchangeable string-edit model over venues. We show that modeling cross-field dependence yields a substantial improvement in performance---a 58% reduction in error over a standard Dirichlet process mixture.
@InProceedings{ hall08unsupervised,
title = {Unsupervised Deduplication using Cross-field Dependencies},
author = {Robert Hall and Charles Sutton and Andrew McCallum},
year = {2008},
booktitle = {Conference on Knowledge Discovery and Data Mining (KDD)}
}
-
An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton, Andrew McCallum.
In Lise Getoor and Ben Taskar, editors. Introduction to Statistical Relational Learning. MIT Press. 2007.
(Detailed tutorial on conditional random fields. Includes motivation, background, mathematical foundations, linear-chain form, general-structure form, inference, parameter estimation, and tips and tricks.)
[ .pdf |
bib
]
@InCollection{ sutton07introduction,
title = {An Introduction to Conditional Random Fields for Relational Learning},
author = {Charles Sutton and Andrew McCallum},
publisher = {MIT Press},
editor = {Lise Getoor and Ben Taskar},
year = {2007},
booktitle = {Introduction to Statistical Relational Learning}
}
-
Piecewise Training for Structured Prediction. Charles Sutton, Andrew McCallum.
In submission . 2008.
(Train undirected graphical model by splitting into overlapping parts that are trained independently. Connections to pseudolikelihood and Bethe free energy. Journal version of UAI and ICML papers below.)
[ .pdf |
bib
| abstract
]
A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood, designed for when variables have large cardinality. On several real-world NLP data sets, piecewise training performs superior to Besag's pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to piecewise and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training.
@Article{ sutton08piecewise,
journal = {In submission},
title = {Piecewise Training for Structured Prediction},
author = {Charles Sutton and Andrew McCallum},
year = {2008}
}
-
Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors. Hanna Wallach, Charles Sutton, Andrew McCallum.
In ICML Workshop on Prior Knowledge for Text and Language Processing. 2008.
(Two Bayesian dependency parsing models: 1. Model with Pitman-Yor prior that significantly improves Eisner's classic model; 2. Latent-variable model that learns "syntactic" topics.)
[ .pdf |
bib
]
@InProceedings{ wallach08bayesian,
title = {Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors},
author = {Hanna Wallach and Charles Sutton and Andrew McCallum},
year = {2008},
booktitle = {ICML Workshop on Prior Knowledge for Text and Language Processing}
}
Finally, I have a collection of brief, tutorial-style research
notes.
Advisors, Mentors, Collaborators
Software I Write
- I have contributed to MALLET,
a Java package for machine learning on text. MALLET includes implementations
of several classification algorithms, including logistic regression, decision
trees, and Winnow, and an implementation of conditional random fields.
- I have also written GRMM, a Java toolkit for inference in graphical models
with discrete variables. It includes facilities for easily constructing and
training CRFs with arbitrary graphical structure. I have used it to produce
results for several of my papers, including those on DCRFs and skip-chain CRFs.
Software I Use
Here is a list of shareware and open-source software that I enjoy using.
Personal
-
What do I have in common with a popular social
bookmarking site?
If this question intrigues you,
then explore the web site of my non-work alter ego,
al.oysi.us.
-
Why doesn't this page have color? Because you haven't cracked the Easter egg yet.