Frederik Kunstner
Frederik Kunstner
Verified email at - Homepage
Cited by
Cited by
Limitations of the empirical Fisher approximation for natural gradient descent
F Kunstner, L Balles, P Hennig
Advances in Neural Information Processing Systems 32, 4158--4169, 2019
BackPACK: Packing more into Backprop
F Dangel, F Kunstner, P Hennig
International Conference on Learning Representations, 2020
Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient
A Mishkin, F Kunstner, D Nielsen, M Schmidt, ME Khan
Advances in Neural Information Processing Systems 31, 6248--6258, 2018
Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be
F Kunstner, J Chen, JW Lavington, M Schmidt
International Conference on Learning Representations, 5, 2023
Adaptive gradient methods converge faster with over-parameterization (but you should do a line-search)
S Vaswani, I Laradji, F Kunstner, SY Meng, M Schmidt, S Lacoste-Julien
arXiv preprint arXiv:2006.06835, 2020
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F Kunstner, R Kumar, M Schmidt
International Conference on Artificial Intelligence and Statistics 130, 3295 …, 2021
Fully Quantized Distributed Gradient Descent
F Künstner, SU Stich, M Jaggi
Technical report, EPFL, 2017
Heavy-tailed class imbalance and why adam outperforms gradient descent on language models
F Kunstner, R Yadav, A Milligan, M Schmidt, A Bietti
arXiv preprint arXiv:2402.19449, 2024
Searching for optimal per-coordinate step-sizes with multidimensional backtracking
F Kunstner, V Sanches Portella, M Schmidt, N Harvey
Advances in Neural Information Processing Systems 36, 2024
Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent--an Open Problem
RL Priol, F Kunstner, D Scieur, S Lacoste-Julien
arXiv preprint arXiv:2111.06826, 2021
Variance Reduced Model Based Methods: New rates and adaptive step sizes
RM Gower, F Kunstner, M Schmidt
OPT 2023: Optimization for Machine Learning, 2023
The system can't perform the operation now. Try again later.
Articles 1–11