I occasionally mention useful/fun facts about or related to RL theory on twitter. I often find myself referring to these tweets in conversations and have a hard time finding them, so I decide to gather the links and put them here.
Model-based RL: On issues with MuZero Loss and error compounding
Connection between robust and offline RL (see sketch of proof)
A useful identity about policy loss in MDPs (related identity)
Why you can’t tune delta adaptively in concentration bounds (h/t Ziyu Xu)
The correct notion of coverage (avg-to-sq)
Visualization of realizability vs. Bellman-completeness
Return max and regret min are not the same in offline RL
Occupancy of history-dependent policies can be reproduced by Markov policies
None of the standard MDP formulations capture Atari games (my old paper which might explain some of the gap)
Computing a good policy using LP: dual seems more robust than primal?
The “correct” proof of FQI (different from my ICML’19 paper) (I now include a more detailed sketch in my course notes)