Follow
Ziniu Li
Ziniu Li
Other namesZi-Niu Li
The Chinese University of Hong Kong, Shenzhen
Verified email at link.cuhk.edu.cn - Homepage
Title
Cited by
Cited by
Year
Error bounds of imitating policies and environments
T Xu, Z Li, Y Yu
Advances in Neural Information Processing Systems 33, 15737-15749, 2020
1012020
Error bounds of imitating policies and environments for reinforcement learning
T Xu, Z Li, Y Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10), 6968 …, 2021
362021
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Z Li, T Xu, Y Zhang, Z Lin, Y Yu, R Sun, ZQ Luo
Forty-first International Conference on Machine Learning, 2024
24*2024
Self-Guided Evolution Strategies with Historical Estimated Gradients
FY Liu, ZN Li, C Qian
IJCAI, 1474-1480, 2020
212020
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Z Li, Y Li, Y Zhang, T Zhang, ZQ Luo
International Conference on Learning Representations, 2022
162022
Rethinking ValueDice - Does It Really Improve Performance?
Z Li, T Xu, Y Yu, ZQ Luo
ICLR Blog, 2022
142022
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis
T Xu, Z Li, Y Yu, ZQ Luo
arXiv preprint arXiv:2208.01899, 2022
11*2022
When is RL better than DPO in RLHF? A Representation and Optimization Perspective
Z Li, T Xu, Y Yu
ICLR Tiny Paper, 2024
9*2024
Why transformers need adam: A hessian perspective
Y Zhang, C Chen, T Ding, Z Li, R Sun, ZQ Luo
arXiv preprint arXiv:2402.16788, 2024
92024
Imitation learning from imperfection: Theoretical justifications and algorithms
Z Li, T Xu, Z Qin, Y Yu, ZQ Luo
Advances in Neural Information Processing Systems 36, 2024
7*2024
Provably Efficient Adversarial Imitation Learning with Unknown Transitions
T Xu, Z Li, Y Yu, ZQ Luo
UAI, 2367-2378, 2023
72023
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
J Xiao, Z Li, X Xie, E Getzen, C Fang, Q Long, WJ Su
arXiv preprint arXiv:2405.16455, 2024
62024
Adam-mini: Use fewer learning rates to gain more
Y Zhang, C Chen, Z Li, T Ding, C Wu, Y Ye, ZQ Luo, R Sun
arXiv preprint arXiv:2406.16793, 2024
32024
A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle
Z Li, T Xu, Y Yu
arXiv preprint arXiv:2203.11489, 2022
12022
Efficient Exploration by Novelty-Pursuit
Z Li, XH Chen
Distributed Artificial Intelligence: Second International Conference, DAI …, 2020
12020
Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity
Z Li, C Chen, T Xu, Z Qin, J Xiao, R Sun, ZQ Luo
arXiv preprint arXiv:2408.16673, 2024
2024
Sensing Jamming Strategy from Limited Observations: An Imitation Learning Perspective
Y Fan, B Jiu, W Pu, Z Li, K Li, H Liu
IEEE Transactions on Signal Processing, 2024
2024
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
C Jia, P Wang, Z Li, YC Li, Z Zhang, N Tang, Y Yu
arXiv preprint arXiv:2405.17039, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–18