CS 598 Statistical Reinforcement Learning

CS 598 Statistical Reinforcement Learning (F18)

Theory of reinforcement learning (RL), with a focus on sample complexity analyses.

Schedule

Date	Lecture	Comments
08/28	Overview, logistics, and MDP basics	slides, hw1
08/30	Value Iteration	note1 (09/11: expand proof of Thm 3)
09/04	Policy Iteration	in-class proof, quiz answers
09/06	Concentration Inequalities & Bandits	note2
09/11	Recap & RL settings	slides
09/13	Sample complexity of certainty equivalence	note3 (updated: 09/20)
09/18	Sample complexity of CE (cont.)	optional reading
09/20	State abstractions	note4 (updated: 10/04)
09/25	State abs. (cont.)	slides
09/27	State abs. (cont.)	see updated slides for 09/25
10/02	Finish state abs., transition to FQI	slides
10/04	Fitted Q-Iteration
10/09	Analysis of FQI	see updated slides for 10/02
10/11	Analysis of FQI (cont.)	note5 (10/16: ref for Bernstein’s; corrected a constant in final bound)
10/16	Importance Sampling	note6 (10/23: error in variance analysis corrected)
10/18	IS
10/23	IS and Policy Gradient	Proposal Due
10/25	Finish PG. Start exploration.	derivation on board
10/30	PAC exploration with Rmax
11/01	Rmax exploration	note7
11/06	Bellman rank	slides, paper
11/08	Bellman rank and OLIVE
11/13	No class (I’m traveling)
11/15	Analysis of OLIVE, volumetric argument	note
11/20	No class (fall break)
11/22	No class (fall break)
11/27	Partially observable systems
11/29	PSR	slides
12/04	No class (NeurIPS)
12/06	No class (NeurIPS)
12/11	Common misconceptions & research trends
12/13	No class (reading day)	Project final report due

Time & Location
Tue & Thu, 02:00-03:15pm, 1302 Siebel.

Office Hours
By appointment. My office is 3322 Siebel.

Prerequisites
Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446), and preferably reinforcement learning. It is also recommended that the students are familiar with stochastic processes and numerical analysis.

Coursework & Grading
Homework may be assigned on an ad hoc basis to help students digest particular material. The main assignment will be a course project that involves literature review, reproduction of theoretical analyses in existing work, and original research (see details below). No exams. Note that this is a grad-level seminar course—if you are concerned about precise grading criteria, this is probably not the right class for you.

Topics Covered in Lectures

Basics of MDPs and RL.
Sample complexity analyses of tabular RL.
Importance sampling and Monte-Carlo methods.
State abstraction theory.
Sample complexity analyses of approximate dynamic programming.
PAC exploration theory (tabular).
PAC exploration theory (function approximation).

Course Project

By default you will work individually. Depending on the class size after the dropping deadline, you might be allowed to work in pairs. This will be announced later.

You can choose one of the following three types of projects:

Reproduce the proofs of existing paper(s). You must fully understand the proofs and rewrite them in your own words. Sometimes a paper considers a relatively general setting and the analysis can be quite complicated. In this case you should aim at scrutinizing the results and presenting them in the cleanest possible way. Ask yourself: What’s the most essential part of the analysis? Can you introduce simplification assumptions to simplify the proofs singificantly without trivializing the results?
Novel research Pick a new research topic and work on it. Be sure to discuss with me before you settle on the topic. The project must contain a significant theoretical component.
Something between 1 & 2 I would encourage most of you to start in this category. The idea is to reproduce the proofs of existing results and see if you can extend the analysis to a more challenging and/or interesting setting. This way, even if you do not get the new results before the end of semester, your project will just fall back to category 1.

I will provide a list of papers and potential project topics by the end of September. You are expected to submit a short project proposal by the end of October. The proposal should consist of a short paragraph describing your project topic, the papers you plan to work on, and the original research question (if applicable).

Resources

Useful inequalities cheat sheet (by László Kozma)

Concentration of measure (by John Lafferty, Han Liu, and Larry Wasserman)

We will not follow a specific textbook, but here are some good books that you can consult:

Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin Puterman.
Reinforcement Learning: An Introduction, by Rich Sutton and Andrew Barto. (draft available online)
Algorithms of Reinforcement Learning, by Csaba Szepesvari. (pdf available online)
Neuro-Dynamic Programming, by Dimitri Bertsekas and John Tsitsiklis.

There are also many related courses whose material is available online. Here is an incomplete list of them (not in any particular order):

Ron Parr. Duke CompSci 590.2.
Ben Van Roy. Stanford MS&E 338.
Ambuj Tewari and Susan Murphy. U Michigan STATS 710.
Susan Murphy. Harvard Stat 234.
Alekh Agarwal and Alex Slivkins. Columbia COMS E6998.001.
Daniel Russo. Columbia B9140-001.
Shipra Agrawal. Columbia IEOR 8100.
Emma Brunskill CMU 15-889e.
Philip Thomas. U Mass CMPSCI 687.
Michael Littman. Brown CSCI2951-F.
R. Srikant. UIUC ECE 586.