Hi, this is Nan Jiang (姜楠). I am a machine learning researcher.
I work on building the theoretical foundation of reinforcement learning (RL), especially in the function-approximation setting.

Prospective students: please read this note.
I am open to collaboration on applying RL to domain X: note


2024 – Now   Associate Professor, CS @ UIUC
2018 – 2024   Assistant Professor, CS @ UIUC
2017 – 2018   Postdoc Researcher, MSR NYC
2011 – 2017   PhD, CSE @ UMich


3322 Siebel Center            CV  
nanjiang at illinois dot edu
 
nanjiang_cs (a collection of some useful(?) tweets)  

Services and Recognitions

Journal Editing and Conference Area Chair

  • STS AE (RL Special Issue, 2024)
  • JMLR Action Editor (2024 – )
  • FnT in ML Editor (2023 – )
  • Senior Area Chair for ICML (2025 – ), ICLR (2024 – ), RLC (2025 – )
  • Area Chair for ICML (2019 – 2024), NeurIPS (2020 – )

Research Awards

Teaching Awards

  • Teaching Excellence with Outstanding Award (CS 598 F20)
  • Teaching Excellence (CS 542: F22, F21, S19, F18; CS 443: S24, S23, S21)

Selected Publications
(click to expand)

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation [arXiv, slides, Simons talk]
(NeurIPS-24) Yuheng Zhang, Nan Jiang.
New coverage concepts identified for model-free OPE in POMDPs

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs [arXiv]
(NeurIPS-23, spotlight) Masatoshi Uehara*, Haruka Kiyohara*, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun.
Modernizing the PSR idea and turning it into a framework that admits model-free function approximation

Reinforcement Learning in Low-Rank MDPs with Density Features [arXiv]
(ICML-23) Audrey Huang*, Jinglin Chen*, Nan Jiang.
Clean results obtained by novel error induction analysis for taming error exponentiation.

Offline Reinforcement Learning with Realizability and Single-policy Concentrability [arXiv]
(COLT-22) Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee.
Behavior regularization is the key to avoiding degenerate saddle points under function approximation

Adversarially Trained Actor Critic for Offline Reinforcement Learning [arXiv]
(ICML-22, Outstanding Paper Runner-up ) Ching-An Cheng*, Tengyang Xie*, Nan Jiang, Alekh Agarwal.
Bellman-consistent pessimism meets the robust policy improvement of imitation learning

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning [arXiv, code]
(NeurIPS-21) Siyuan Zhang, Nan Jiang.
BVFT shows promising empirical performance for offline policy selection.

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function [arXiv]
(COLT-21) Gellert Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári.
Cute tensorization trick for generative model + linear V*.

Batch Value-function Approximation with Only Realizability [arXiv, talk]
(ICML-21) Tengyang Xie, Nan Jiang.
Learning Q* from a realizable and otherwise arbitrary function class, which was believed to be impossible

Minimax Weight and Q-Function Learning for Off-Policy Evaluation [arXiv]
(ICML-20) Masatoshi Uehara, Jiawei Huang, Nan Jiang.
Learning importance weights and value functions from each other, with connections to many old and new algorithms in RL.

Information-Theoretic Considerations in Batch Reinforcement Learning [pdf, poster, MSR talk, Simons talk]
(ICML-19) Jinglin Chen, Nan Jiang.
Revisiting some fundamental aspects of value-based RL.

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable [ICML version, arXiv, errata, poster, talk video]
(ICML-17) Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire.
A new and general theory of exploration in RL with function approximation.

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning [pdf, poster]
(ICML-16) Nan Jiang, Lihong Li.
Simple and effective improvement of importance sampling via control variates.


Talk on BVFT and Bellman-consistent pessimism