CS 598 Statistical Reinforcement Learning (F20)

Note: This course has been approved as a regular course in the curriculum and given a regular course number, 542. For a more recent version of the course, please visit the CS 542 page.

Theory of reinforcement learning (RL), with a focus on sample complexity analyses.


Project topics and references

Please self-enroll on Piazza.

Schedule

Date Lecture Comments  
08/26 Overview, logistics, and MDP basics video, slides  
08/28 MDP basics video, note1, reading hw1  
09/02 Value Iteration video, blackboard  
09/04 VI video, blackboard  
09/09 Policy Iteration hw2, video, blackboard  
09/11 PI, LP video, blackboard  
09/16 MAB basics video, blackboard, note2  
09/18 Learning settings video, slides  
09/23 Sample complexity of certainty equivalence video, blackboard, note3, Hw2 due before class  
09/25 Sample complexity of CE (cont) video, blackboard  
09/30 Cancelled (Simons workshop)    
10/02 State abstractions video, annotated slides (updated), note4, clean slides  
10/07 State abstractions video  
10/09 Abstractions & FQI video, clean slides  
10/14 FQI video  
10/16 FQI video  
10/21 FQI proof video, annotated slides & handwriting, note5  
10/23 Importance sampling video, note6, blackboard  
10/28 IS, PG video, blackboard, hw3 available  
10/30 Marginalized IS video, blackboard (updated: 11/4)  
11/04 MIS video  
11/06 Office hour (no lecture)    
11/11 Exploration (Rmax) note7, video, blackboard (updated: 11/13), hw3 due  
11/13 Rmax video  
11/18 Bellman rank video, paper, slides (updated: 11/20)  
11/20 OLIVE video  
11/25 Fall break    
11/27 Fall break    
12/02 Exploration in linear MDPs video, blackboard (see clarification at the end)  
12/04 Partial observability video, slides  
12/09 PSR video  
12/11 Project due EOD    

Time & Location
Wed & Fri, 12:30-01:45pm. Zoom link

TA
Tengyang Xie (please contact the TA via Piazza)

Office Hours
TBA

Prerequisites
Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446), and preferably reinforcement learning. It is also recommended that the students are familiar with stochastic processes and numerical analysis.

Coursework & Grading
Homework may be assigned on an ad hoc basis to help students digest particular material. The main assignment will be a course project that involves literature review, reproduction of theoretical analyses in existing work, and original research (see details below). No exams.

Topics Covered in Lectures

  • Basics of MDPs and RL.
  • Sample complexity analyses of tabular RL.
  • Policy Gradient.
  • Off-policy evaluation.
  • State abstraction theory.
  • Sample complexity analyses of approximate dynamic programming.
  • PAC exploration theory (tabular).
  • PAC exploration theory (function approximation).
  • Partial observability and dynamical system modeling.

Course Project

You will work individually. You can choose one of the following three types of projects:

  • Reproduce the proofs of existing paper(s). You must fully understand the proofs and rewrite them in your own words. Sometimes a paper considers a relatively general setting and the analysis can be quite complicated. In this case you should aim at scrutinizing the results and presenting them in the cleanest possible way. Ask yourself: What’s the most essential part of the analysis? Can you introduce simplification assumptions to simplify the proofs sigificantly without trivializing the results?

  • Novel research Pick a new research topic and work on it. Be sure to discuss with me before you settle on the topic. The project must contain a significant theoretical component.

  • Something between 1 & 2 I would encourage most of you to start in this category. The idea is to reproduce the proofs of existing results and see if you can extend the analysis to a more challenging and/or interesting setting. This way, even if you do not get the new results before the end of semester, your project will just fall back to category 1.

See the link at the top of this page for potential topics. You are expected to submit a short project proposal in the middle of the semester. The proposal should consist of a short paragraph describing your project topic, the papers you plan to work on, and the original research question (if applicable).

Resources

Useful inequalities cheat sheet (by László Kozma)

Concentration of measure (by John Lafferty, Han Liu, and Larry Wasserman)

We will not follow a specific textbook, but here are some good books that you can consult:

  • Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin Puterman.
  • Reinforcement Learning: An Introduction, by Rich Sutton and Andrew Barto. (draft available online)
  • Algorithms of Reinforcement Learning, by Csaba Szepesvari. (pdf available online)
  • Neuro-Dynamic Programming, by Dimitri Bertsekas and John Tsitsiklis.

Alekh Agarwal, Sham Kakade, and I also have a draft monograph which contained some of the lecture notes from this course.

There are also many related courses whose material is available online. Here is an incomplete list of them (not in any particular order):

  • R. Srikant. UIUC ECE 586.
  • Ron Parr. Duke CompSci 590.2.
  • Ben Van Roy. Stanford MS&E 338.
  • Ambuj Tewari and Susan Murphy. U Michigan STATS 710.
  • Susan Murphy. Harvard Stat 234.
  • Alekh Agarwal and Alex Slivkins. Columbia COMS E6998.001.
  • Daniel Russo. Columbia B9140-001.
  • Shipra Agrawal. Columbia IEOR 8100.
  • Emma Brunskill CMU 15-889e.
  • Philip Thomas. U Mass CMPSCI 687.
  • Michael Littman. Brown CSCI2951-F.