Tweet Archive

I occasionally mention useful/fun facts about or related to RL theory on twitter. I often find myself referring to these tweets in conversations and have a hard time finding them, so I decide to gather the links and put them here.

Model-based RL: On issues with MuZero Loss and error compounding

Connection between robust and offline RL (see sketch of proof)

A useful identity about policy loss in MDPs (related identity)

Why you can’t tune delta adaptively in concentration bounds (h/t Ziyu Xu)

The correct notion of coverage (avg-to-sq)

Visualization of realizability vs. Bellman-completeness

Return max and regret min are not the same in offline RL

Occupancy of history-dependent policies can be reproduced by Markov policies

None of the standard MDP formulations capture Atari games (my old paper which might explain some of the gap)

Cite in the [XYZ’00] format

Computing a good policy using LP: dual seems more robust than primal?

The “correct” proof of FQI (different from my ICML’19 paper) (I now include a more detailed sketch in my course notes)