Google Scholar

Reinforcement Learning: Theory and Algorithms [working draft]
(Monograph) Alekh Agarwal, Nan Jiang, Sham Kakade, Wen Sun. (remark)


Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality [coming soon]
(ICLR-22) Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu.

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction [arXiv]
(AISTATS-22) Jiawei Huang, Nan Jiang.

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning [arXiv, code]
(NeurIPS-21) Siyuan Zhang, Nan Jiang.

Bellman-consistent Pessimism for Offline Reinforcement Learning [arXiv]
(NeurIPS-21, w/ oral presentation) Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal.

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning [arXiv]
(NeurIPS-21) Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai.

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning [arXiv]
(NeurIPS-21 Datasets and Benchmarks) Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue.

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function [arXiv]
(COLT-21) Gellert Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári.

Model-free Representation Learning and Exploration in Low-rank MDPs [arXiv]
(preprint) Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal.

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency [arXiv]
(preprint) Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie.

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting [arXiv]
(Technical Note) Philip Amortila, Nan Jiang, Tengyang Xie.

Batch Value-function Approximation with Only Realizability [arXiv, talk]
(ICML-21) Tengyang Xie, Nan Jiang.

Minimax Model Learning [arXiv]
(AISTATS-21) Cameron Voloshin, Nan Jiang, Yisong Yue.

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration [arXiv]
(AAAI-21) Priyank Agrawal, Jinglin Chen, Nan Jiang.

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization [arXiv]
(NeurIPS-20) Nan Jiang, Jiawei Huang.

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison [arXiv]
(UAI-20) Tengyang Xie, Nan Jiang.

Minimax Weight and Q-Function Learning for Off-Policy Evaluation [arXiv]
(ICML-20) Masatoshi Uehara, Jiawei Huang, Nan Jiang.

From Importance Sampling to Doubly Robust Policy Gradient [arXiv]
(ICML-20) Jiawei Huang, Nan Jiang.

On Value Functions and the Agent-Environment Boundary [arXiv]
(Technical Note) Nan Jiang.

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles [arXiv]
(AISTATS-20) Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh.

Provably Efficient Q-Learning with Low Switching Cost [arXiv]
(NeurIPS-19) Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang.

Deterministic Bellman Residual Minimization [pdf]
(OptRL Workshop at NeurIPS-19) Ehsan Saleh, Nan Jiang.

Information-Theoretic Considerations in Batch Reinforcement Learning [pdf, poster, MSR talk, Simons talk]
(ICML-19) Jinglin Chen, Nan Jiang.

Provably Efficient RL with Rich Observations via Latent State Decoding [arXiv]
(ICML-19) Simon Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford.

Model-based RL in CDPs: PAC bounds and Exponential Improvements over Model-free Approaches [arXiv]
(COLT-19) Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford.

On Oracle-Efficient PAC RL with Rich Observations [arXiv]
(NeurIPS-18, w/ spotlight talk) Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire.

Completing State Representations using Spectral Learning [pdf, code, poster]
(NeurIPS-18) Nan Jiang, Alex Kulesza, Satinder Singh.

Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon [pdf]
(COLT-18) Nan Jiang, Alekh Agarwal.

Hierarchical Imitation and Reinforcement Learning [arXiv]
(ICML-18) Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III.

Markov Decision Processes with Continuous Side Information [arXiv]
(ALT-18) Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari.

PAC Reinforcement Learning with an Imperfect Model [pdf, poster]
(AAAI-18) Nan Jiang.

Repeated Inverse Reinforcement Learning [arXiv, errata, poster, talk video]
(NeurIPS-17, w/ spotlight talk) Kareem Amin*, Nan Jiang*, Satinder Singh (*Equal contribution).

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable [ICML version, arXiv, errata, poster, talk video]
(ICML-17) Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire.

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning [pdf, poster]
(ICML-16) Nan Jiang, Lihong Li.

On Structural Properties of MDPs that Bound Loss due to Shallow Planning [pdf]
(IJCAI-16) Nan Jiang, Satinder Singh, Ambuj Tewari.

Improving Predictive State Representations via Gradient Descent [pdf, poster]
(AAAI-16) Nan Jiang, Alex Kulesza, Satinder Singh.

Abstraction Selection in Model-based Reinforcement Learning [pdf, talk video]
(ICML-15) Nan Jiang, Alex Kulesza, Satinder Singh.

The Dependence of Effective Planning Horizon on Model Accuracy [pdf, errata, poster, talk video]
(AAMAS-15, best paper award) Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis.

Low-Rank Spectral Learning with Weighted Loss Functions [pdf]
(AISTATS-15) Alex Kulesza, Nan Jiang, Satinder Singh.

Spectral Learning of Predictive State Representations with Insufficient Statistics [pdf]
(AAAI-15) Alex Kulesza, Nan Jiang, Satinder Singh.

Improving UCT Planning via Approximate Homomorphisms [pdf, supplement]
(AAMAS-14) Nan Jiang, Satinder Singh, Richard Lewis.


PhD Thesis

A Theory of Model Selection in Reinforcement Learning. [pdf]
(2017) Nan Jiang.