CS 443 Reinforcement Learning

CS 443 Reinforcement Learning (S24)

Introduction to reinforcement learning (RL).

Previous semesters: S23, S21, F19.
Also see CS 542 for a more theoretical version of the course.

All slides and notes will be found on this website.

Schedule

Date	Lecture	Comments
01/16	Introduction	slides
01/18	MDP formulation	slides
01/23	Value function	note, meet over zoom
01/25	Value function (cont.)
01/30	Optimality
02/01	Value iteration
02/06	VI (cont.)
02/08	Policy iteration
02/13	PI (cont.)	HW1 due before class
02/15	Learning settings	slides
02/20	Value prediction	slides, reading: Sec 3.1 of Szepesvári
02/22	TD	annotated slides (updated: 02/27)
02/27	TD lambda	HW2 due before class
02/29	Function approximation	slides, annotated, reading: Sec 3.2 of Szepesvári
03/05	Control	slides, reading: Sutton & Barto, Chap 10
03/07	Off-policy learning	annotated slides, project proposal due EOW
03/19	Importance sampling	reference, slides
03/26	Policy Gradient	slides (updated: 03/28), HW3 due
03/28	PG
04/02	State abstraction	slides
04/04	Abstraction	ref
04/09	In-class Office Hour	HW4 due, Practice exam released
04/11	In-class Office Hour	Final exam in the week after
04/15	Exam	Start 1:30pm, 1404 SC, Monday
04/18	Exploration	slides
04/23	Partial observability	slides
04/25	Bayesian RL	slides
04/30	Imitation learning	slides, 4 credit project due 05/05 EOD

Prerequisites
Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446) highly recommended.

Time & Location
Tue & Thu, 2-3:15pm. 1306 Everitt Lab.

Lecture recording
Please see this channel on Mediaspace. You can subscribe to it to be automatically notified of new recordings.

Canvas
Canvas will be the main platform for handling homework assignments and discussions. All announcements (including Assignments when they are created) will be made through Canvas, so make sure you turn on system notification emails. For students who registered later than Jan 10: please contact the TAs to be added to Canvas.

TAs & Office Hours
The-Anh Vu-Le and Rohan Deb. See Canvas announcement for OH.

Textbook
We will not follow a specific textbook, but readings may be assigned based on the following textbooks whose pdfs are freely available online.

Reinforcement Learning: An Introduction, by Rich Sutton and Andrew Barto. (available online; also old version here)
Algorithms of Reinforcement Learning, by Csaba Szepesvári. (pdf available online)

Coursework & Grading
For 3 credit students: Your grade will consist of 2 components:

Homework (~70%): There will be 4~5 homework assignments, including both written and coding assignments.
Final/Late Mid Exam (~30%): There will be a final exam (or a mid exam that is relatively late). Date TBA.

For 4 credit students: You will need to additionally work on a final project (20%; the points of other components will be reduced proportionally). You can either work on your own or work in a team of size 2. The project should be about reproducing the theoretical analysis or the empirical experiments of a published paper on RL; you do not need to reproduce the full paper and can be selective about which part you work on. You are expected to discuss with me the choice of topic in the middle of the semester. For those who want to work on theory, please refer to the CS542 site for the guidelines (though you are expected to spend less effort than the CS542 project) and the list of seed papers.

Academic Integrity
Jeff Erickson has a good page on this. TL;DR from him: “Be honest. Cite your sources. We mean it. If you need help, please ask.”

Late Policy
Late homework will not be accepted. Instead, your lowest homework score will be dropped. Additional late-exceptions will only be granted in a case-by-case manner when compelling reasons are presented (e.g., documented emergencies).

Disability
Please let me know as soon as possible if you need accommodations for disability.

Tentative List of Topics

MDP basics.
Planning: value iteration, policy iteration, and their analyses.
Model-based and value-based learning algorithms: certainty-equivalence, Q-learning, TD.
Policy gradient.
Importance sampling and off-policy evaluation.
State abstractions.
Partial observability.