Computer Science 294
Practical Machine Learning
Prof. Michael Jordan
Monday 4:00-6:00 PM, Tan Hall 180
Office hours for the lecturer of the week: Thursday 12:30-2:00 PM, Soda Hall, Alcove 511
Fall 2006
Announcements
- Jan 23: The project reports and posters webpage is now online!
- Nov 30: The second part of homework 5 is now posted.
- Nov 22: The first part of homework 5 is now posted.
- Oct 22: There is an error in the code for problem 2 of Part 2. Download the revised version here. Also, you actually don't need to modify computePCA().
- Oct 17: There is a typo in problem 1(b) of homework 3. Please download
the revised version.
- Oct 16: Percy's office hours will be from 1-2:30pm
on Thursday, Oct. 19 in the Soda 511 alcove.
- Sept 26:
- A wiki and a discussion group for the class has been added to
bSpace. Feel free to use those to
discuss assignments, project topics, questions about the class, etc. A thread to
find project partners has already been created in the discussion group. To access
bSpace, simply visit https://bspace.berkeley.edu
and login using your CalNet ID. If you don't have a CalNet ID and you want to join,
send an email to slacoste-AT-eecs to request a guest account. Everybody who is
registered to the class has already the cs294-10 site added. If you're not registered
to the class, you can add it by going through My Workspace | Membership, then click
on 'Joinable Sites' and search for '294'. COMPSCI 294 10 F06 shouldbe in the results.
- The project info has been posted. Take note of the deadline for the deliverables.
- Sept 19: The OH of Romain will be Thursday from 12:30-2pm in Soda 511 (alcove).
- Sept 14:
- All of Homework 1 is posted
- Slides for the classification lecture have been posted.
- Please take the quick survey before next Wednesday (it is to help us tailor the class for you). Click here to take survey
- Sep 11:
- Please sign up for the class mailing list [even though you gave your email in the
first class - THIS IS A NEW LIST] by sending an email with the words "subscribe cs294-fall06" in the
body(without the quotes) to majordomo@lists.berkeley.edu
The text has to be in the body; the subject line is not processed.
- A draft of the first part of the assignment has been posted. The full assignment will be posted Wednesday; due Monday Sept 25.
- The OH of Simon will be Thursday from 12:30-2pm in Soda 511 (alcove). Feel free to drop by with project topic questions as well.
- The slides about classification as well as pointers to relevant literature will be posted Wednesday.
- A survey will be posted on this website on Wednesday as well. Please fill it in to let us know what is your background and what you want to get from this class.
- Until most of the people has signed up on the mailing list, please keep watching this website for announcements.
- Aug 30: Stating Sept 11th, the class will be held in 180 Tan Hall
- Aug 29: The tutorial section will be Friday Sept. 1, 3:00-5:00pm in Soda 306
- Aug 22: The first day of class is August 28th.
Topics
Prerequisites
- Some prior exposure to probability and to linear algebra
Suggested Reading
Readings for the specific sections will be provided in the future. There are
several good resources which contain general information.
-
Hastie, Tibshirani and Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction Book's web site
-
Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques Book's web site
-
Andrew Moore's Tutorials are
a collection of PDF tutorials on many of the topics that will be covered in the class.
Homework
Students will be required to complete bi-weekly homework assignments.
These must be turned in on time to receive credit. There will also
be a final project. A project report will be required and projects
will also be presented in an end-of-term poster session. The homeworks
will count for 60% of the grade and the project will count for
40% of the grade.
- A complete version of Homework 1 is available. Dataset as tgz or zip file. The
homework is due in class on Monday Sept 25. Direct questions on the tutorial
and classification parts to Alex Simma (asimma-at-eecs) and regression questions
to Romain (thibaux at eecs).
- Homework 2 is now available. You'll also need the supplementary data and code. This homework is due in class on Monday, October 9th. Questions on the feature selection part (questions 1-2) should be sent to Ben Blum (bblum-at-eecs) and questions on the diagnostics section (question 3) should be sent to Gad Kimmel (kimmel-at-eecs).
- Homework 3 (updated version) is now available.
You'll also need the supplementary data and code.
This homework is due in class on Monday, October 23rd.
Questions on clustering should be sent to Junming Yin (junming-at-eecs)
and questions on linear dimensionality reduction should be sent to Percy Liang (pliang-at-eecs).
- Homework 4 is now available.
This homework is due in class on Monday, November 13th.
You'll need supplementary code/data for the
first part and second
part of the homework. Questions on hidden markov models should
be sent to Erik Sudderth (sudderth-at-eecs), questions on
anomaly/sequential detection should be sent to
XuanLong Nguyen (xuanlong-at-eecs), and questions on reinforcement
learning should be sent to Peter Bodik (bodikp-at-eecs).
- First part of Homework 5 is now available. The second part is here. The whole assignment is due on the day of the poster session,
Monday December 11th. Don't wait last minute to finish this assignment, as you also
have the poster and the project report to do! The extra time is for more flexibiilty for you.
This is the last homework. Questions on nonlinear dimensionality reduction sould be sent to Fei
Sha (feisha-at-eecs) and questions on structured classification should be sent to Guillaume
Obozinski (gobo-at-stat).
The project counts for roughly 40% of your grade. We will use the same guidelines as the ones
for cs281a of last year [though of less theoretical flavor];
please read them here.
The main idea is to have you apply a concept from the class in your own research, or explore it further through
experimentation. The evaluation of the project will be based on the following three deliverables:
- Submit on bSpace one paragraph describing your project plan or ideas by Monday October 30th.
The idea is to have you start working on the project before
December... Feel free to come to OH to discuss project ideas, to send emails to the lecturers,
or to use the wiki/discussion group on bSpace to brainstorm ideas.
- You will present a poster about your project on
Monday December 11th from 1:00-4:00 pm in Soda 306.
- Submit your project write-up on bSpace by Tuesday December 12th.
The guideline page mentioned above contains examples of project write-ups and posters,
just to give you an idea of what one can do.
Final list of project reports and posters.
Software
There is a wide variety of data mining and machine learning software
available.
- Weka is a large
Java package implementing many learning algorithms.
- YALE(Yet Another Learning Environment)
is an alternative (and complimentary) Java package. It includes a GUI which
allows automation of the whole data path from feature normalization through
feature selection, learning and cross validation.
- SVM light and LibSVM are two popular implementations of
various SVM algorithms
- R is an interactive programming
language designed for statistics. Many very useful libraries are available.