CS 288: Statistical Natural Language Processing, Spring 2011

Instructor: Dan Klein
Lecture: Tuesday and Thursday 12:30pm-2:00pm, 405 Soda Hall
Office Hours: Tuesday and Thursday 3:30pm-4:30pm in 724 (or 730) Sutardja Dai Hall.
GSI: Adam Pauls
Office Hours : Wednesday 4-5pm, 751 Soda Hall


1/16/11:  The previous website has been archived.
1/20/11:  Assignment 1 has been posted. It is due on February 3rd.
2/07/11:  An online forum has been created for this class. The course staff (Adam) will check this forum regularly and answer questions as they arise. Important class announcements will also be posted here (and on this web page).
2/07/11:  Assignment 2 has been posted. It is due on February 24th.
2/10/11:  A bug in Assignment 2 has been fixed. Please download the latest version of the code.
2/21/11:  Writing comments are available.  Also, some sample write-ups.
2/21/11:  Amazon has given us a grant that provides each student with $100 of EC2 credits! Email Adam for an access code. Some (brief) instructions are available.
2/28/11:  Another bug in Assignment 2 has been fixed. This bug incorrectly ignored the distortion score when scoring a hypothesis. Please download the latest version of the code.
3/01/11:  Assignment 3 has been posted. It is due on March 14th.
3/01/11:  Final project guidelines have been posted.
4/08/11:  Assignment 4 has been posted. It is due on May 9th.


This course will explore current statistical techniques for the automatic analysis of natural (human) language data. The dominant modeling paradigm is corpus-driven statistical learning, with a split focus between supervised and unsupervised methods.  This term has a new syllabus concentrating on machine translation and, to a lesser extent, structured classification, so (1) the syllabus is more tentative than usual and (2) the projects will be new this term.

This course assumes a good background in basic probability and a strong ability to program in Java. Prior experience with linguistics or natural languages is helpful, but not required.  There will be a lot of statistics, algorithms, and coding in this class.  The recommended background is cs188 (or cs281a) and cs170 (or cs270).  An A in cs 188 (or cs281a) is required.  This course will be more work-intensive than most graduate or undergraduate courses.


The primary recommended texts for this course are:

Note that M&S is free online.  Also, make sure you get the purple 2nd edition of J+M, not the white 1st edition.

Syllabus [subject to substantial change!]

Week Date Topics Techniques Readings Assignments (Out) Assignments (Due)
1 Jan 18 Course Introduction [2PP] [6PP] J+M 1, M+S 1-3    
Jan 20 Language Modeling I [2PP] [6PP] KN / Smoothing J+M 4, M+S 6, Chen & Goodman, Interpreting KN HW1: Language Models
2 Jan 25 Language Modeling II [2PP] [6PP] Large Data Massive Data, Bloom, Perfect, Efficient LMs  
Jan 27 Speech Recognition I [2PP] [6PP] Phonetics J+M 7  
3 Feb 1 Speech Recognition II [2PP] [6PP] HMMs J+M 9  
Feb 3 Part-of-Speech [2PP] [6PP] Decoding J+M 5, Brants, Toutanova & Manning   HW1
4 Feb 8 Phrase-Based MT [2PP] [6PP] Decoding HW2: Phrase MT


Feb 10 Alignment I [2PP] [6PP] IBM Models J+M 25, IBM Models, HMM Agreement Discriminative, Decoding    
5 Feb 15 Alignment II [2PP] [6PP] EM      
Feb 17 Phrase Alignment [2PP] [6PP] Phrase Alignment Learning Phrases, Generative  
6 Feb 22 Structured Classification I [2PP] [6PP] Margin      
Feb 24 Structured Classification II [2PP] [6PP] Likelihood     HW2
7 Mar 1 Stuctured Classification III [2PP] [6PP] Kernels HW3: Alignment  
Mar 3 Structured Classifiication IV [2PP] [6PP] Structure M3Ns, Cutting Plane    
8 Mar 8 Parsing I [2PP] [6PP] PCFGs M+S 3.2, 12.1, J+M 11    
Mar 10 Parsing II [2PP] [6PP] PCFGs M+S 11, J+M 12, Best-First, A*, Unlexicalized    
9 Mar 15 Parsing III [2PP] [6PP] Other Models Split, Lexicalized, K-Best   HW3
Mar 17 Parsing IV [2PP] [6PP] Reranking    
10 Mar 22 Spring Break
Mar 24 Spring Break
11 Mar 29 Syntactic MT I [2PP] [6PP] Hiero, String-Tree, Tree-String, Tree-Tree FP Guidelines  
Mar 31 Syntactic MT II [a-2PP] [a-6PP] [b]      
12 Apr 5 Semantics I [2PP] [6PP] SRL / Montague J+M 16, 19, Manning, J+M 18    
Apr 7 Semantics II [2PP] [6PP] LF Parsing Parsing to LF HW4: Parsing / Classification  
13 Apr 12 Semantics III [1PP] Grounded      
Apr 14 Coreference [2PP] [6PP] Supervised, Unsupervised, J+M 21  
14 Apr 19 Summarization [2PP] [6PP] Topic-based, N-gram based  
Apr 21 Question Answering [2PP] [6PP] N-gram-based, Grammar-based  
15 Apr 26 Grammar Induction [2PP] [6PP]    
Apr 28 Diachronics [2PP] [6PP] Reconstruction   FP Due May 17th