Theory of reinforcement learning (RL), with a focus on sample complexity analyses.
Archived pages for previous semesters: F18
Date | Lecture | Comments |
---|---|---|
01/15 | Overview, logistics, and MDP basics | slides, reading hw1 |
01/22 | Value Iteration | note1 |
01/29 | Policy Iteration | pictorial proof (formal proof in note1) |
02/05 | The learning setting | slides |
02/07 | Hoeffding’s, MAB, lower bound | note2 |
02/12 | Sample complexity of certainty-equivalence | note3 |
02/19 | State abstractions | slides, note4, reading: MDP homomorphisms |
02/28 | Fitted Q-Iteration | slides, note5, Project proposal due |
03/07 | Homework 2 due | |
03/14 | Importance Sampling and Policy Gradient | note6 |
04/20 | No class |
Time & Location
Tue & Thu, 12:30-01:45pm, 1304 Siebel.
TA
Jinglin Chen.
Office Hours
By appointment. My office is 3322 Siebel.
Prerequisites
Linear algebra, probability & statistics, and basic calculus. Experience with machine learning (e.g., CS 446), and preferably reinforcement learning. It is also recommended that the students are familiar with stochastic processes and numerical analysis.
Coursework & Grading
Homework may be assigned on an ad hoc basis to help students digest particular material. The main assignment will be a course project that involves literature review, reproduction of theoretical analyses in existing work, and original research (see details below). No exams. Note that this is a grad-level seminar course—if you are concerned about precise grading criteria, this is probably not the right class for you.
Topics Covered in Lectures
Course Project
You will work individually. You can choose one of the following three types of projects:
Reproduce the proofs of existing paper(s). You must fully understand the proofs and rewrite them in your own words. Sometimes a paper considers a relatively general setting and the analysis can be quite complicated. In this case you should aim at scrutinizing the results and presenting them in the cleanest possible way. Ask yourself: What’s the most essential part of the analysis? Can you introduce simplification assumptions to simplify the proofs sigificantly without trivializing the results?
Novel research Pick a new research topic and work on it. Be sure to discuss with me before you settle on the topic. The project must contain a significant theoretical component.
Something between 1 & 2 I would encourage most of you to start in this category. The idea is to reproduce the proofs of existing results and see if you can extend the analysis to a more challenging and/or interesting setting. This way, even if you do not get the new results before the end of semester, your project will just fall back to category 1.
See the link at the top of this page for potential topics. You are expected to submit a short project proposal in the middle of the semester. The proposal should consist of a short paragraph describing your project topic, the papers you plan to work on, and the original research question (if applicable).
Resources
Useful inequalities cheat sheet (by László Kozma)
Concentration of measure (by John Lafferty, Han Liu, and Larry Wasserman)
We will not follow a specific textbook, but here are some good books that you can consult:
There are also many related courses whose material is available online. Here is an incomplete list of them (not in any particular order):