Hyperparameter selection for offline reinforcement learning TL Paine, C Paduraru, A Michi, C Gulcehre, K Zolna, A Novikov, Z Wang, ... arXiv preprint arXiv:2007.09055, 2020 | 149 | 2020 |
Faster sorting algorithms discovered using deep reinforcement learning DJ Mankowitz, A Michi, A Zhernov, M Gelmi, M Selvi, C Paduraru, ... Nature 618 (7964), 257-263, 2023 | 106 | 2023 |
Nash learning from human feedback R Munos, M Valko, D Calandriello, MG Azar, M Rowland, ZD Guo, Y Tang, ... arXiv preprint arXiv:2312.00886, 2023 | 31 | 2023 |
A generic human–machine annotation framework based on dynamic cooperative learning Y Zhang, A Michi, J Wagner, E André, B Schuller, F Weninger IEEE transactions on cybernetics 50 (3), 1230-1239, 2019 | 17 | 2019 |
Towards practical reinforcement learning for tokamak magnetic control BD Tracey, A Michi, Y Chervonyi, I Davies, C Paduraru, N Lazic, F Felici, ... Fusion Engineering and Design 200, 114161, 2024 | 3 | 2024 |
Towards practical reinforcement learning for tokamak magnetic control BD Tracey, A Michi, Y Chervonyi, I Davies, C Paduraru, N Lazic, F Felici, ... arXiv preprint arXiv:2307.11546, 2023 | 3 | 2023 |
OFFLINE HYPERPARAMETER SELECTION FOR OFFLINE REINFORCEMENT LEARNING T Le Paine, C Paduraru, A Michi, C Gulcehre, K Zołna, A Novikov, ... | | |