Introduction to reinforcement learning (RL). See CS598 for a more theoretical version of the course here

All slides, notes, and deadlines will be found on this website.

Date | Lecture | Comments | |
---|---|---|---|

08/27 | Introduction to the course (slides) | ||

08/29 | MDP basics (slides) | See reading at the end of slides | |

09/03 | MDP basics (see updated slides for 08/29) | reading about different MDP formulations | |

09/05 | Value Iteration (slides) | Hw0 due before class | |

09/10 | VI (ref for proofs: 598note1) | ||

09/12 | Policy Iteration & LP (slides, picture proof) | Hw1 released | |

09/17 | Monte-Carlo value prediction (slides) | reading: Sec 3.1 of Szepesvári | |

09/19 | TD (slides) | ||

09/24 | Review Session | ||

09/26 | TD(λ), function approximation (slides) | HW1 due before class | |

10/01 | Func approx | HW2 released | |

10/03 | SARSA and Q-learning (slides) | ||

10/08 | Wrap-up of TD, Importance Sampling (slides) | Quiz | |

10/10 | IS (ref for proofs: 598note6) | ||

10/15 | PG (slides, onenote) | HW2 due before class | |

10/22 | Abstractions (slides; updated 10/22 11pm) | Li et al’06 | |

10/24 | Abstractions (ref for proof: 598note4, onenote) | ||

10/29 | Abstractions (onenote, Gordon’95, Homomorphisms) | HW3 due before class | |

10/31 | Exploration (onenote, ref for proof: 598note2) | reading: Sec 4.2 of Szepesvári | |

11/05 | Exploration (slides; updated 11/12 10am) | ||

11/12 | Exploration | ||

11/14 | Review Session | HW4 due before class | |

11/19 | Late Mid Exam | ||

11/21 | Partial observability (slides; updated: 12/2 8pm) | ||

12/03 | Partial observability | ||

12/05 | Bayesian RL (slides) | ||

12/10 | Imitation Learning (slides) | Last day of class | |

12/15 | No class | 4-credit final report due |

**Prerequisites**

Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446) highly recommended.

**Piazza**

Please self-enroll here.

**Time & Location**

Tue & Thu, 2-3:15pm, 0216 Siebel.

**TAs & Office Hours**

Jinglin Chen: 5-6pm Tuesday, 0216 Siebel.

Philip Amortila: 10-11am Friday, 3403 Siebel. (Exceptions: Oct 25 and Nov 1 OHs will be held in 4403.)

**Coursework & Grading**

For 3 credit students: Your grade will consist of 3 components:

- Homework (50%): There will be
*roughly*5 homework assignments, including both written and coding assignments. - Participation (15%): Part of this
*might*take the form of pop quizzes given in class without prior announcement. - Final/Late Mid Exam (35%): There will be a final exam (or a mid exam that is relatively late). Date TBA.

For 4 credit students: You will need to additionally work on a final project (20%; the points of other components will be reduced proportionally). You can either work on your own or work in a team of size 2. The project should be about reproducing the theoretical analysis or the empirical experiments of a published paper on RL; you do not need to reproduce the full paper and can be selective about which part you work on. You are expected to discuss with me the choice of topic in the middle of the semester. For those who want to work on theory, please refer to the CS598 site for the guidelines (though you are expected to spend less effort than the CS598 project) and the list of seed papers.

**Academic Integrity**

Jeff Erickson has a good page on this. TL;DR from him: “Be honest. Cite your sources. We mean it. If you need help, please ask.”

**Late Policy**

Late homework will not be accepted. Instead, your lowest homework score will be dropped. Additional late-exceptions will only be granted in a case-by-case manner when compelling reasons are presented (e.g., documented emergencies).

**Disability**

Please let me know as soon as possible if you need accommodations for disability.

**Textbook**

We will not follow a specific textbook, but readings may be assigned based on the following textbooks whose pdfs are freely available online.

- Reinforcement Learning: An Introduction, by Rich Sutton and Andrew Barto. (draft available online)
- Algorithms of Reinforcement Learning, by Csaba Szepesvári. (pdf available online)

**Tentative List of Topics**

- MDP basics.
- Planning: value iteration, policy iteration, and their analyses.
- Model-based and value-based learning algorithms: certainty-equivalence, Q-learning, TD.
- Policy gradient.
- Importance sampling and off-policy evaluation.
- State abstractions.
- Partial observability.