Neural auto-curricula in two-player zero-sum games X Feng, O Slumbers, Z Wan, B Liu, S McAleer, Y Wen, J Wang, Y Yang Advances in Neural Information Processing Systems 34, 3504-3517, 2021 | 45* | 2021 |
Malib: A parallel framework for population-based multi-agent reinforcement learning M Zhou, Z Wan, H Wang, M Wen, R Wu, Y Wen, Y Yang, Y Yu, J Wang, ... Journal of Machine Learning Research 24 (150), 1-12, 2023 | 36 | 2023 |
Alphazero-like tree-search can guide large language model decoding and training X Feng, Z Wan, M Wen, Y Wen, W Zhang, J Wang arXiv preprint arXiv:2309.17179, 2023 | 15 | 2023 |
Order matters: Agent-by-agent policy optimization X Wang, Z Tian, Z Wan, Y Wen, J Wang, W Zhang arXiv preprint arXiv:2302.06205, 2023 | 8 | 2023 |
On realization of intelligent decision-making in the real world: A foundation decision model perspective Y Wen, Z Wan, M Zhou, S Hou, Z Cao, C Le, J Chen, Z Tian, W Zhang, ... arXiv preprint arXiv:2212.12669, 2022 | 3 | 2022 |
Natural Language Reinforcement Learning X Feng, Z Wan, M Yang, Z Wang, GA Koushiks, Y Du, Y Wen, J Wang arXiv preprint arXiv:2402.07157, 2024 | | 2024 |