CS 294-6 Recognizing People, Objects and Actions

Jitendra Malik
Spring 2004
405 Soda Hall
Tu 4-6

Course Content

This course will be designed around the challenge problem of making
computers aware of the everyday visual world i.e. process images or
video to be able to recognize categories such as cars, buses, tigers,
zebras, rooms, doors, telephones, faces, arms and hands as well as
actions such as running, jumping and kicking. Topics will include a
survey of human visual recognition: perception and physiology,
recognition in the presence of transformations, local matching
techniques, global matching techniques, segmentation as a front end,
motion descriptors for action recognition, as well as case studies of
recognition in different domains. I have a specific list of about 300
visual categories to focus our thoughts.

Lecture Topics

  1. Introduction: Characteristics of visual recognition. Prototypes and affordances. Basic, Superordinate and subordinate categories (reference: Palmer, Chapter 9)
  2. Human visual system
    1.  Basic computations in retina, LGN, V1, V2
    2.  Models of receptive fields-center-surround, oriented, simple/complex
    3. Cortical magnification factor, log-polar mapping
  3. Five approaches to handwritten digit recognition
    1. LeCun's convolutional neural nets
    2. Simard et al's Tangent Distance
    3. Belongie, Malik and Puzicha: Shape Contexts
    4. Decoste & Scholkopf  :  Invariant SVMs
    5. Amit, Geman and Wilder:  Randomized Decision Trees
  4. Template matching using distance transform variants
    1. Chamfer distance
      1. Barrow et al 
      2. Borgefors
      3. Gavrila & Philomin
    2. Hausdorff distance
      1. Huttenlocher, Klanderman & Rucklidge
      2. Olson & Huttenlocher
  5. Discussion of transformations in general
    1. D'Arcy Thompson, Fischler and Elschlager, Grenander
    2. Similarity and Affine transforms
    3. Smooth diffeomorphisms, Thin Plate Splines
  6. Local  scale-invariant keypoint features
    1. David Lowe,  Distinctive Image Features from Scale-Invariant Keypoints, IJCV 04
    2. Tony Lindeberg, Principles for Automatic Scale Selection, CVAP KTH Tech Report
  7. Pose estimation, pose clustering, geometric hashing,  basis views
  8. Multiple view approaches to 3D objects - aspects, k-medoids
  9. Perceptual Organization - Grouping, figure/ground
  10.  The Human Body
  11. Human Movement
  12. Scenes.
  13. Project presentations.

There is no required text for this course.  Steve Palmer's Vision Science and Forsyth and Ponce's Computer vision: A Modern Approach have useful source material.

We will use a scribe system to make course notes available through the semester. Each lecture, one or two students will take turns taking notes and typing them up. I'll edit and make the notes available on the web.

The grade will be determined by a combination of home assignments, scribe notes, and a final project. The project could be the mathematical/statistical analysis of a visual task or the implementation of some interesting algorithm or some psychophysical experiment.

You'll be encouraged to work in teams for the projects and for the home assignments.

I hope you enjoy the course!

General Papers

Lecture  Notes

Homework