CS 294-125, Spring 2016: Human-compatible AI
Reading list




This list is still under construction. An empty bullet item indicates more readings to come for that week.

Books


Week 1 (1/26): Markov decision processes

TODO

Week 2 (2/2): Reinforcement learning, multi-attribute utility theory, preference elicitation


Week 3 (2/9): Goal inference.


Week 4 (2/16): Human preferences


Week 5 (2/23): Collaborative systems


Week 6 (3/1): Psychology of moral decisions


Week 7 (3/8): Inverse reinforcement learning


Week 8 (3/15): Inverse reinforcement learning (cont'd)


Week 9 (3/22):

Spring Break

Week 10 (3/29): Multiagent Sequential Decision Making


Week 11 (4/5): Game theory


Week 12 (4/12): Inverse games


Week 13 (4/19): Embedded reinforcement learning, Baldwinian evolution

  • Mark Ring and Laurent Orseau, "Delusion, Survival, and Intelligent Agents." In Proc. AGI, 2011.
    Describes a possible difficulty with reward-based agents, wherein the agent builds a delusion box that produces fake rewards that make it happy.
    • (optional) Daniel Dewey, "Learning What to Value.". In Proc. AGI, 2011.
      Argues that wireheading arises from RL formulations and proposes instead an approach based on learning an initially unknown utility function.
    • (optional) Bill Hibbard, "Model-based Utility Functions.". JAGI, 3(1), 1-24, 2012.
      Proposes and analyzes a solution to the wireheading problem based on utility functions that depend on unobserved state variables whose values the agent must infer.
    • (optional) Laurent Orseau and Mark Ring, "Space-Time Embedded Intelligence.". Proc. AGI, 2012.
      Defines a very general notion of rationality for agents whose computational substrate is part of the environment they inhabit.
  • David Ackley and Michael Littman, Interactions between learning and evolution. In Proc. Artificial Life II, 1991.
    Discusses the origin of reward functions and how learning speeds up evolution, clarifying the Baldwin effect first proposed in 1896.

Week 14 (4/26): Corrigibility

  • Soares, Nate, et al. "Corrigibility." Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.

Week 15:

Reading/Review/Recitation