A scalable laplace approximation for neural networks H Ritter, A Botev, D Barber 6th international conference on learning representations, ICLR 2018 …, 2018 | 487 | 2018 |
Online structured laplace approximations for overcoming catastrophic forgetting H Ritter, A Botev, D Barber Advances in Neural Information Processing Systems 31, 2018 | 348 | 2018 |
Practical Gauss-Newton optimisation for deep learning A Botev, H Ritter, D Barber International Conference on Machine Learning, 557-565, 2017 | 259 | 2017 |
Hamiltonian generative networks P Toth, DJ Rezende, A Jaegle, S Racanière, A Botev, I Higgins arXiv preprint arXiv:1909.13789, 2019 | 249 | 2019 |
Nesterov's accelerated gradient and momentum as approximations to regularised update descent A Botev, G Lever, D Barber 2017 International joint conference on neural networks (IJCNN), 1899-1903, 2017 | 198 | 2017 |
Griffin: Mixing gated linear recurrences with local attention for efficient language models S De, SL Smith, A Fernando, A Botev, G Cristian-Muraru, A Gu, R Haroun, ... arXiv preprint arXiv:2402.19427, 2024 | 69 | 2024 |
Better, faster fermionic neural networks JS Spencer, D Pfau, A Botev, WMC Foulkes arXiv preprint arXiv:2011.07125, 2020 | 53 | 2020 |
Disentangling by subspace diffusion D Pfau, I Higgins, A Botev, S Racanière Advances in Neural Information Processing Systems 33, 17403-17415, 2020 | 35 | 2020 |
Deep transformers without shortcuts: Modifying self-attention for faithful signal propagation B He, J Martens, G Zhang, A Botev, A Brock, SL Smith, YW Teh arXiv preprint arXiv:2302.10322, 2023 | 33 | 2023 |
Complementary Sum Sampling for Likelihood Approximation in Large Scale Classification A Botev, B Zheng, D Barber AISTATS 54, 1030-1038, 2017 | 33 | 2017 |
Deep learning without shortcuts: Shaping the kernel with tailored rectifiers G Zhang, A Botev, J Martens arXiv preprint arXiv:2203.08120, 2022 | 32 | 2022 |
Applications of flow models to the generation of correlated lattice QCD ensembles R Abbott, A Botev, D Boyda, DC Hackett, G Kanwar, S Racanière, ... Physical Review D 109 (9), 094514, 2024 | 30 | 2024 |
Which priors matter? benchmarking models for learning latent dynamics A Botev, A Jaegle, P Wirnsberger, D Hennes, I Higgins arXiv preprint arXiv:2111.05458, 2021 | 28 | 2021 |
Aspects of scaling and scalability for flow-based sampling of lattice QCD R Abbott, MS Albergo, A Botev, D Boyda, K Cranmer, DC Hackett, ... The European Physical Journal A 59 (11), 257, 2023 | 25 | 2023 |
Sampling QCD field configurations with gauge-equivariant flow models R Abbott, MS Albergo, A Botev, D Boyda, K Cranmer, DC Hackett, ... arXiv preprint arXiv:2208.03832, 2022 | 20 | 2022 |
Normalizing flows for lattice gauge theory in arbitrary space-time dimension R Abbott, MS Albergo, A Botev, D Boyda, K Cranmer, DC Hackett, ... arXiv preprint arXiv:2305.02402, 2023 | 18 | 2023 |
The Gauss-Newton matrix for Deep Learning models and its applications A Botev UCL (University College London), 2020 | 10 | 2020 |
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models A Botev, S De, SL Smith, A Fernando, GC Muraru, R Haroun, L Berrada, ... arXiv preprint arXiv:2404.07839, 2024 | 6 | 2024 |
Symetric: Measuring the quality of learnt hamiltonian dynamics inferred from vision I Higgins, P Wirnsberger, A Jaegle, A Botev Advances in Neural Information Processing Systems 34, 25591-25605, 2021 | 6 | 2021 |
Dealing with a large number of classes--Likelihood, Discrimination or Ranking? D Barber, A Botev arXiv preprint arXiv:1606.06959, 2016 | 5 | 2016 |