Reinforcement Learning: Theory and Algorithms [working draft]
(Monograph) Alekh Agarwal, Nan Jiang, Sham Kakade, Wen Sun. (remark)
(* = Equal contribution or alphabetical)
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees [pdf]
(STS, invited submission under review) Nan Jiang, Tengyang Xie.
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs [openreview]
(ICLR-25) Yuheng Zhang, Nan Jiang.
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [openreview]
(ICLR-25) Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu.
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation [arXiv, slides, Simons talk]
(NeurIPS-24) Yuheng Zhang, Nan Jiang.
Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity [arXiv]
(NeurIPS-24, oral presentation) Philip Amortila*, Dylan J Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi.
Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality [coming soon]
(NeurIPS-24) Audrey Huang, Nan Jiang.
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model [arXiv]
(NeurIPS-24) Chenlu Ye*, Wei Xiong*, Yuheng Zhang*, Hanze Dong*, Nan Jiang, Tong Zhang.
A Note on Loss Functions and Error Compounding in Model-based Reinforcement Learning [arXiv]
(Technical Note) Nan Jiang.
RLHF Workflow: From Reward Modeling to Online RLHF [arXiv]
(TMLR-24) Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang.
Mitigating the Alignment Tax of RLHF [arXiv]
(EMNLP-24) Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, Tong Zhang.
Non-adaptive Online Finetuning for Offline Reinforcement Learning [openreview]
(RLC-24) Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik.
Word Embeddings Are Steers for Language Models [openreview]
(ACL-24) Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek F. Abdelzaher, Heng Ji.
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint [arXiv]
(ICML-24) Wei Xiong*, Hanze Dong*, Chenlu Ye*, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang.
Harnessing Density Ratios for Online Reinforcement Learning [arXiv]
(ICLR-24, spotlight) Philip Amortila*, Dylan Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie.
Model-free Representation Learning and Exploration in Low-rank MDPs [pdf]
(JMLR-24) Aditya Modi*, Jinglin Chen*, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal.
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs [arXiv]
(NeurIPS-23, spotlight) Masatoshi Uehara*, Haruka Kiyohara*, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun.
Adversarial Model for Offline Reinforcement Learning [arXiv]
(NeurIPS-23) Mohak Bhardwaj*, Tengyang Xie*, Byron Boots, Nan Jiang, Ching-An Cheng.
Marginalized Importance Sampling for Off-Environment Policy Evaluation [arXiv]
(CoRL-23) Pulkit Katdare, Nan Jiang, Katherine Driggs-Campbell.
Reinforcement Learning in Low-Rank MDPs with Density Features [arXiv]
(ICML-23) Audrey Huang*, Jinglin Chen*, Nan Jiang.
The Optimal Approximation Ratios in Misspecified Off-Policy Value Function Estimation [arXiv (stronger results than conf ver.)]
(ICML-23) Philip Amortila, Nan Jiang, Csaba Szepesvari.
Offline Learning in Markov Games with General Function Approximation [arXiv]
(ICML-23) Yuheng Zhang, Yu Bai, Nan Jiang.
The Role of Coverage in Online Reinforcement Learning [arXiv]
(ICLR-23, notable top-5% ) Tengyang Xie*, Dylan J Foster*, Yu Bai, Nan Jiang, Sham Kakade.
Explaining RL Decisions with Trajectories [openreview]
(ICLR-23) Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian.
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions [arXiv]
(NeurIPS-22) Audrey Huang, Nan Jiang.
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation [arXiv]
(NeurIPS-22) Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster.
On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL [arXiv]
(NeurIPS-22) Jinglin Chen*, Aditya Modi*, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal.
Interaction-Grounded Learning with Action-inclusive Feedback [arXiv]
(NeurIPS-22) Tengyang Xie*, Akanksha Saran*, Dylan J Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford.
Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret [arXiv]
(NeurIPS-22) Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu.
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps [arXiv]
(UAI-22) Jinglin Chen, Nan Jiang.
Offline Reinforcement Learning with Realizability and Single-policy Concentrability [arXiv]
(COLT-22) Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee.
Adversarially Trained Actor Critic for Offline Reinforcement Learning [arXiv]
(ICML-22, Outstanding Paper Runner-up ) Ching-An Cheng*, Tengyang Xie*, Nan Jiang, Alekh Agarwal.
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes [arXiv]
(ICML-22) Chengchun Shi*, Masatoshi Uehara*, Jiawei Huang, Nan Jiang.
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality [arXiv]
(ICLR-22) Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu.
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction [arXiv]
(AISTATS-22) Jiawei Huang, Nan Jiang.
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning [arXiv, code]
(NeurIPS-21) Siyuan Zhang, Nan Jiang.
Bellman-consistent Pessimism for Offline Reinforcement Learning [arXiv]
(NeurIPS-21, oral presentation) Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal.
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning [arXiv]
(NeurIPS-21) Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai.
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning [arXiv]
(NeurIPS-21 Datasets and Benchmarks) Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue.
On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function [arXiv]
(COLT-21) Gellert Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári.
Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency [arXiv]
(preprint) Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie.
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting [arXiv]
(Technical Note) Philip Amortila*, Nan Jiang, Tengyang Xie.
Batch Value-function Approximation with Only Realizability [arXiv, talk]
(ICML-21) Tengyang Xie, Nan Jiang.
Minimax Model Learning [arXiv]
(AISTATS-21) Cameron Voloshin, Nan Jiang, Yisong Yue.
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration [arXiv]
(AAAI-21) Priyank Agrawal*, Jinglin Chen*, Nan Jiang.
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization [arXiv]
(NeurIPS-20) Nan Jiang, Jiawei Huang.
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison [arXiv]
(UAI-20) Tengyang Xie, Nan Jiang.
Minimax Weight and Q-Function Learning for Off-Policy Evaluation [arXiv]
(ICML-20) Masatoshi Uehara, Jiawei Huang, Nan Jiang.
From Importance Sampling to Doubly Robust Policy Gradient [arXiv]
(ICML-20) Jiawei Huang, Nan Jiang.
On Value Functions and the Agent-Environment Boundary [arXiv]
(Technical Note) Nan Jiang.
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles [arXiv]
(AISTATS-20) Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh.
Provably Efficient Q-Learning with Low Switching Cost [arXiv]
(NeurIPS-19) Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang.
Deterministic Bellman Residual Minimization [pdf]
(OptRL Workshop at NeurIPS-19) Ehsan Saleh, Nan Jiang.
Information-Theoretic Considerations in Batch Reinforcement Learning [pdf, poster, MSR talk, Simons talk]
(ICML-19) Jinglin Chen, Nan Jiang.
Provably Efficient RL with Rich Observations via Latent State Decoding [arXiv]
(ICML-19) Simon Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford.
Model-based RL in CDPs: PAC bounds and Exponential Improvements over Model-free Approaches [arXiv]
(COLT-19) Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford.
On Oracle-Efficient PAC RL with Rich Observations [arXiv]
(NeurIPS-18, spotlight talk) Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire.
Completing State Representations using Spectral Learning [pdf, code, poster]
(NeurIPS-18) Nan Jiang, Alex Kulesza, Satinder Singh.
Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon [pdf]
(COLT-18) Nan Jiang, Alekh Agarwal.
Hierarchical Imitation and Reinforcement Learning [arXiv]
(ICML-18) Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III.
Markov Decision Processes with Continuous Side Information [arXiv]
(ALT-18) Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari.
PAC Reinforcement Learning with an Imperfect Model [pdf, poster]
(AAAI-18) Nan Jiang.
Repeated Inverse Reinforcement Learning [arXiv, errata, poster, talk video]
(NeurIPS-17, spotlight talk) Kareem Amin*, Nan Jiang*, Satinder Singh.
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable [ICML version, arXiv, errata, poster, talk video]
(ICML-17) Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire.
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning [pdf, poster]
(ICML-16) Nan Jiang, Lihong Li.
On Structural Properties of MDPs that Bound Loss due to Shallow Planning [pdf]
(IJCAI-16) Nan Jiang, Satinder Singh, Ambuj Tewari.
Improving Predictive State Representations via Gradient Descent [pdf, poster]
(AAAI-16) Nan Jiang, Alex Kulesza, Satinder Singh.
Abstraction Selection in Model-based Reinforcement Learning [pdf, talk video]
(ICML-15) Nan Jiang, Alex Kulesza, Satinder Singh.
The Dependence of Effective Planning Horizon on Model Accuracy [pdf, errata, poster, talk video]
(AAMAS-15, Best Paper Award) Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis.
Low-Rank Spectral Learning with Weighted Loss Functions [pdf]
(AISTATS-15) Alex Kulesza, Nan Jiang, Satinder Singh.
Spectral Learning of Predictive State Representations with Insufficient Statistics [pdf]
(AAAI-15) Alex Kulesza, Nan Jiang, Satinder Singh.
Improving UCT Planning via Approximate Homomorphisms [pdf, supplement]
(AAMAS-14) Nan Jiang, Satinder Singh, Richard Lewis.
A Theory of Model Selection in Reinforcement Learning. [pdf]
(2017) Nan Jiang.