Han Zhong
Han Zhong
Verified email at - Homepage
Cited by
Cited by
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962, 2022
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
arXiv preprint arXiv:2205.15512, 2022
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
H Zhong, Z Yang, Z Wang, MI Jordan
Journal of Machine Learning Research 24 (35), 1-52, 2023
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
International Conference on Machine Learning, 27117-27142, 2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
X Chen, H Zhong, Z Yang, Z Wang, L Wang
International Conference on Machine Learning, 3773-3793, 2022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
International Conference on Machine Learning, 24496-24523, 2022
Why robust generalization in deep learning is difficult: Perspective of expressive power
B Li, J Jin, H Zhong, J Hopcroft, L Wang
Advances in Neural Information Processing Systems 35, 4370-4384, 2022
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
H Zhong, Z Yang, Z Wang, C Szepesvári
arXiv preprint arXiv:2110.08984, 2021
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
Thirty-seventh Conference on Neural Information Processing Systems, 2023
A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes
H Zhong, T Zhang
Advances in Neural Information Processing Systems 36, 2024
Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage
J Blanchet, M Lu, T Zhang, H Zhong
Advances in Neural Information Processing Systems 36, 2024
Nearly optimal policy optimization with stable at any time guarantee
T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao
International Conference on Machine Learning, 24243-24265, 2022
Gibbs Sampling from Human Feedback: A Provable KL-constrained Framework for RLHF
W Xiong, H Dong, C Ye, H Zhong, N Jiang, T Zhang
arXiv preprint arXiv:2312.11456, 2023
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
International Conference on Learning Representations, 2021/9/29, 2021
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
H Zhong, J Huang, L Yang, L Wang
Advances in Neural Information Processing Systems 34, 2021
Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
J Hu, H Zhong, C Jin, L Wang
arXiv preprint arXiv:2210.15598, 2022
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
H Zhong, X Deng, EX Fang, Z Yang, Z Wang, R Li
arXiv preprint arXiv:2012.14098, 2020
A reduction-based framework for sequential decision making with delayed feedback
Y Yang, H Zhong, T Wu, B Liu, L Wang, SS Du
Advances in Neural Information Processing Systems 36, 2024
Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
H Zhong, J Hu, Y Xue, T Li, L Wang
arXiv preprint arXiv:2302.10796, 2023
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
G Feng, H Zhong
arXiv preprint arXiv:2312.17248, 2023
The system can't perform the operation now. Try again later.
Articles 1–20