CS 542 Statistical Reinforcement Learning

CS 542 Statistical Reinforcement Learning (F23)

Theory of reinforcement learning (RL), with a focus on sample complexity analyses.

Previous semesters: F22, F21; as CS598: F20, S19, F18

11/08: Schedule change: we will meet on 11/15 (Wed) as usual (Friday lecture is still cancelled). All remaining lectures are moved up by one slot and we will finish the course a little earlier.

10/20: Schedule for the rest of the semester is tentative and subject to change.

Schedule

Date	Lecture	Comments
08/23	Overview & logistics	slides (updated: 08/30)
08/25	MDP basics	note1, reading homework 0
08/30	Bellman equations
09/01	Value Iteration	blackboard
09/06	VI	blackboard
09/08	Policy Iteration	blackboard, HW1 available
09/13	PI, LP	blackboard
09/15	Concentration inequalities	note2, blackboard
09/20	Certainty Equivalence	blackboard (updated: 09/22), note3
09/22	CE	HW1 due
09/27	Abstraction	note4, slides, HW2 available
09/29	Abstraction	annotated slides
10/04	Fitted-Q	slides
10/06	Fitted-Q
10/11	FQE analysis	annonated slides, note5, Akshay’s note, HW2 due
10/13	FQI analysis, IS	blackboard (FQI), (IS), note6, HW3 available
10/18	Importance sampling	blackboard
10/20	Policy gradient	see updated note6, blackboard; Proposal due
10/25	MIS, Online RL	blackboard (MIS), Ref: MWL/MQL, interval; blackboard (online)
10/27	Rmax exploration	note7, blackboard, HW3 due, HW4 available
11/01	Bellman rank	slides
11/03	OLIVE	blackboard, paper
11/08	Martingale concentration & ridge regression	note8
11/11	Elliptical potential & covering	blackboard, HW4 due
11/15	LSVI-UCB	blackboard
11/16	Seminar, 2:30-3:30pm, zoom link on Canvas	talk recording
11/17	No class
11/22	Fall break (no class)
11/24	Fall break (no class)
11/29	Partial observability	slides
12/01	PSRs
12/06	No class
12/07	Reading day (no class)	Project report due EOD

Time & Location
Wed & Fri, 2-3:15pm. 0216 Siebel.

Recording
Lectures will be recorded and made available on Mediaspace. You can subscribe to the channel to be alerted about new recordings.

Online Platform
Canvas will be used as the platform for announcement, discussions, and homework submissions. It is important that you join Canvas and turn on notifications to receive announcements in a timely manner.

TA
Philip Amortila, Yuheng Zhang. (Please contact the TAs via Canvas messages.)

Office Hours
I will stay till 4pm for questions after each lecture (may leave early if no one is in the classroom). Additional TA OHs may be added in an on-demand manner.

Prerequisites
Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446), and preferably reinforcement learning. It is also recommended that the students are familiar with stochastic processes and numerical analysis.

Coursework & Grading
The main assignment will be a course project that involves literature review, reproduction of theoretical analyses in existing work, and original research (see details below). No exams. Grades decomposition: tentatively 40% homework and 60% project; subject to +-10% changes depending on the actual number of graded homework assignments (expected: 3; there may be separate ad hoc assignments that are not graded).

Statement on CS CARES and CS Values and Code of Conduct
All members of the Illinois Computer Science department - faculty, staff, and students - are expected to adhere to the CS Values and Code of Conduct. The CS CARES Committee is available to serve as a resource to help people who are concerned about or experience a potential violation of the Code. If you experience such issues, please contact the CS CARES Committee. The instructors of this course are also available for issues related to this class.

Topics Covered in Lectures

Basics of MDPs and RL.
Sample complexity analyses of tabular RL.
Policy Gradient.
Off-policy evaluation.
State abstraction theory.
Sample complexity analyses of approximate dynamic programming.
PAC exploration theory (tabular).
PAC exploration theory (function approximation).
Partial observability and dynamical system modeling.

Course Project

You will work individually. You can choose one of the following three types of projects:

Reproduce the proofs of existing paper(s). You must fully understand the proofs and rewrite them in your own words. Sometimes a paper considers a relatively general setting and the analysis can be quite complicated. In this case you should aim at scrutinizing the results and presenting them in the cleanest possible way. Ask yourself: What’s the most essential part of the analysis? Can you introduce simplification assumptions to simplify the proofs sigificantly without trivializing the results?
Novel research Pick a new research topic and work on it. Be sure to discuss with me before you settle on the topic. The project must contain a significant theoretical component.
Something between 1 & 2 I would encourage most of you to start in this category. The idea is to reproduce the proofs of existing results and see if you can extend the analysis to a more challenging and/or interesting setting. This way, even if you do not get the new results before the end of semester, your project will just fall back to category 1.

See the link at the top of this page for potential topics. You are expected to submit a short project proposal in the middle of the semester. The proposal should consist of a short paragraph describing your project topic, the papers you plan to work on, and the original research question (if applicable).

Resources

Useful inequalities cheat sheet (by László Kozma)

Concentration of measure (by John Lafferty, Han Liu, and Larry Wasserman)

We will not follow a specific textbook, but here are some good books that you can consult:

Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin Puterman.
Reinforcement Learning: An Introduction, by Rich Sutton and Andrew Barto. (draft available online)
Algorithms of Reinforcement Learning, by Csaba Szepesvari. (pdf available online)
Neuro-Dynamic Programming, by Dimitri Bertsekas and John Tsitsiklis.

Alekh Agarwal, Sham Kakade, Wen Sun, and I also have a draft monograph which contained some of the lecture notes from this course.

There are also many related courses whose material is available online. Here is an incomplete list (not in any particular order; list from 2019 and has not been updated since then):

R. Srikant. UIUC ECE 586.
Ron Parr. Duke CompSci 590.2.
Ben Van Roy. Stanford MS&E 338.
Ambuj Tewari and Susan Murphy. U Michigan STATS 710.
Susan Murphy. Harvard Stat 234.
Alekh Agarwal and Alex Slivkins. Columbia COMS E6998.001.
Daniel Russo. Columbia B9140-001.
Shipra Agrawal. Columbia IEOR 8100.
Emma Brunskill CMU 15-889e.
Philip Thomas. U Mass CMPSCI 687.
Michael Littman. Brown CSCI2951-F.