Introduction to reinforcement learning (RL).
Previous semesters (as CS 498): S21, F19.
Also see CS 542 for a more theoretical version of the course.
All slides, notes, and deadlines will be found on this website.
Date | Lecture | Comments |
---|---|---|
01/17 | Introduction | slides |
01/19 | MDP formulation | slides |
01/24 | Value function | note |
01/26 | Bellman equation | |
01/31 | Optimality | |
02/02 | Value Iteration | blackboard |
02/07 | VI (cont) | blackboard |
02/09 | Policy Iteration | blackboard, HW1 due EOW |
02/14 | PI (cont), LP | blackboard |
02/16 | Learning settings | slides, blackboard |
02/21 | Value prediction | slides, reading: Sec 3.1 of Szepesvári |
02/23 | TD | blackboard, HW2 due EOW |
02/28 | Function approximation | slides, blackboard, reading: Sec 3.2 of Szepesvári |
03/02 | Control | slides |
03/07 | Off-policy learning | |
03/09 | In-class OH | |
03/21 | Importance sampling | reference, slides, blackboard |
03/23 | IS | HW3 due EOW |
03/28 | Policy gradient | blackboard |
03/30 | PG | slides |
04/04 | State abstraction | slides |
04/06 | State abstraction | ref, HW4 due EOW |
04/11 | In-class OH | |
04/13 | In-class OH | |
04/18 | Exam | 1:50-3:25pm |
04/20 | Exploration | slides |
04/25 | Partial observability | slides |
04/27 | Bayesian RL | slides |
05/02 | Imitation learning | slides, 4 credit project due 05/07 EOD |
Prerequisites
Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446) highly recommended.
Time & Location
Tue & Thu, 2-3:15pm. 1306 Everitt Lab.
In special circumstances we will meet over zoom instead (will be announced in advance); see Canvas announcement for the zoom link.
Lecture recording
Please see this channel on Mediaspace. You can subscribe to it to be automatically notified of new recordings.
Textbook
We will not follow a specific textbook, but readings may be assigned based on the following textbooks whose pdfs are freely available online.
Canvas
Here. All announcements (including Assignments when they are created) will be made through Canvas, so make sure you can receive system notification emails from it.
TAs & Office Hours
Audrey Huang and Philip Amortila. OH: 1pm Friday.
Coursework & Grading
For 3 credit students: Your grade will consist of 2 components:
For 4 credit students: You will need to additionally work on a final project (20%; the points of other components will be reduced proportionally). You can either work on your own or work in a team of size 2. The project should be about reproducing the theoretical analysis or the empirical experiments of a published paper on RL; you do not need to reproduce the full paper and can be selective about which part you work on. You are expected to discuss with me the choice of topic in the middle of the semester. For those who want to work on theory, please refer to the CS542 site for the guidelines (though you are expected to spend less effort than the CS542 project) and the list of seed papers.
Academic Integrity
Jeff Erickson has a good page on this. TL;DR from him: “Be honest. Cite your sources. We mean it. If you need help, please ask.”
Late Policy
Late homework will not be accepted. Instead, your lowest homework score will be dropped. Additional late-exceptions will only be granted in a case-by-case manner when compelling reasons are presented (e.g., documented emergencies).
Disability
Please let me know as soon as possible if you need accommodations for disability.
Tentative List of Topics