CS 294. Learning in Sequential Decision Problems


This course will focus on the design and theoretical analysis of learning methods for sequential decision-making under uncertainty. Sequential decision problems involve a trade-off between exploitation (optimizing performance based on the information at hand) and exploration (gathering more information). These problems arise in many important domains, ranging from clinical trials, through computer network optimization and adaptive packet routing, to website and page content optimization, marketing campaign and internet advertising optimization, and revenue management.

Topics covered will include a selection from the following list. Stochastic and game theoretic formulations of sequential decision problems: multi-armed bandits, linear, convex, and Lipschitz bandits, large-scale (combinatorial) bandit problems, contextual bandits, and reinforcement learning/approximate dynamic programming approaches to controlling Markov decision processes; and tools for finite-sample performance analysis.

See the course website for more information.

General Catalog