CS 498 Reinforcement Learning

CS 498 Reinforcement Learning (S21)

Introduction to reinforcement learning (RL). See CS598 for a more theoretical version of the course here

All slides, notes, and deadlines will be found on this website.

Schedule

Date	Lecture	Comments
01/26	Introduction to the course	slides
01/28	Intro, MDP basics	slides, blackboard
02/02	Value function and Bellman equation	annotated slides (updated: 02/04)
02/04	Bellman equation	see updated slides above
02/09	Formulating problems as MDPs	reading
02/11	Value Iteration	blackboard
02/16	VI (cont)	blackboard
02/18	Policy Iteration	blackboard, HW1 due
02/23	PI, LP	blackboard
02/25	Learning settings	slides, blackboard
03/02	MC value prediction	slides (updated: 03/06), reading: Sec 3.1 of Szepesvári
03/04	TD(0) and TD(lambda)
03/09	Function Approximation	slides (updated: 03/11), HW2 due
03/11	TD w/ FA
03/16	Control & Off-policy	slides
03/23	Importance Sampling	blackboard (updated: 03/25), reference (advanced material in this note is not covered), ref slides (not used in lecture)
03/25	IS, PG	blackboard (updated: 04/01), HW3 due EOD 03/27
03/30	PG	ref slides
04/01	PG
04/06	Abstraction	slides (updated: 04/08)
04/08	Abstraction	ref notes, Hw4 due 04/12 EOD
04/13	No instruction day
04/15	Take home exam
04/20	Exploration	slides
04/22	Exploration
04/27	Partial Observability	slides
04/29	Bayesian RL	slides
05/04	Imitation Learning	slides
05/06	No class (end of semester)	4 credit report due

Prerequisites
Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446) highly recommended.

Campuswire (tentative)
Please self-enroll here. Code 6078.

Time & Location
Tue & Thu, 2-3:15pm. Zoom link TBA.

TAs & Office Hours
Jinglin Chen and Jiawei Huang. OH TBA.

Coursework & Grading
For 3 credit students: Your grade will consist of 2 components:

Homework (60%): There will be roughly 5 homework assignments, including both written and coding assignments.
Final/Late Mid Exam (40%): There will be a final exam (or a mid exam that is relatively late). Date TBA.

For 4 credit students: You will need to additionally work on a final project (20%; the points of other components will be reduced proportionally). You can either work on your own or work in a team of size 2. The project should be about reproducing the theoretical analysis or the empirical experiments of a published paper on RL; you do not need to reproduce the full paper and can be selective about which part you work on. You are expected to discuss with me the choice of topic in the middle of the semester. For those who want to work on theory, please refer to the CS598 site for the guidelines (though you are expected to spend less effort than the CS598 project) and the list of seed papers.

Academic Integrity
Jeff Erickson has a good page on this. TL;DR from him: “Be honest. Cite your sources. We mean it. If you need help, please ask.”

Late Policy
Late homework will not be accepted. Instead, your lowest homework score will be dropped. Additional late-exceptions will only be granted in a case-by-case manner when compelling reasons are presented (e.g., documented emergencies).

Disability
Please let me know as soon as possible if you need accommodations for disability.

Textbook
We will not follow a specific textbook, but readings may be assigned based on the following textbooks whose pdfs are freely available online.

Reinforcement Learning: An Introduction, by Rich Sutton and Andrew Barto. (draft available online)
Algorithms of Reinforcement Learning, by Csaba Szepesvári. (pdf available online)

Tentative List of Topics

MDP basics.
Planning: value iteration, policy iteration, and their analyses.
Model-based and value-based learning algorithms: certainty-equivalence, Q-learning, TD.
Policy gradient.
Importance sampling and off-policy evaluation.
State abstractions.
Partial observability.