Follow
Frederik Kunstner
Frederik Kunstner
Verified email at cs.ubc.ca - Homepage
Title
Cited by
Cited by
Year
Limitations of the empirical Fisher approximation for natural gradient descent
F Kunstner, L Balles, P Hennig
Advances in Neural Information Processing Systems 32, 4158--4169, 2019
2252019
BackPACK: Packing more into Backprop
F Dangel, F Kunstner, P Hennig
International Conference on Learning Representations, 2020
1192020
Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient
A Mishkin, F Kunstner, D Nielsen, M Schmidt, ME Khan
Advances in Neural Information Processing Systems 31, 6248--6258, 2018
762018
Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be
F Kunstner, J Chen, JW Lavington, M Schmidt
International Conference on Learning Representations, 5, 2023
55*2023
Adaptive gradient methods converge faster with over-parameterization (but you should do a line-search)
S Vaswani, I Laradji, F Kunstner, SY Meng, M Schmidt, S Lacoste-Julien
arXiv preprint arXiv:2006.06835, 2020
43*2020
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F Kunstner, R Kumar, M Schmidt
International Conference on Artificial Intelligence and Statistics 130, 3295 …, 2021
362021
Heavy-tailed class imbalance and why adam outperforms gradient descent on language models
F Kunstner, R Yadav, A Milligan, M Schmidt, A Bietti
arXiv preprint arXiv:2402.19449, 2024
152024
Fully Quantized Distributed Gradient Descent
F Künstner, SU Stich, M Jaggi
Technical report, EPFL, 2017
82017
Searching for optimal per-coordinate step-sizes with multidimensional backtracking
F Kunstner, V Sanches Portella, M Schmidt, N Harvey
Advances in Neural Information Processing Systems 36, 2725-2767, 2023
62023
Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent--an Open Problem
RL Priol, F Kunstner, D Scieur, S Lacoste-Julien
arXiv preprint arXiv:2111.06826, 2021
32021
Variance Reduced Model Based Methods: New rates and adaptive step sizes
RM Gower, F Kunstner, M Schmidt
OPT 2023: Optimization for Machine Learning, 2023
22023
Why do machine learning optimizers that work, work?
F Kunstner
University of British Columbia, 2024
2024
Normalization Matters for Optimization Performance on Graph Neural Networks
A Milligan, F Kunstner, H Shirzad, M Schmidt, DJ Sutherland
OPT 2024: Optimization for Machine Learning, 0
The system can't perform the operation now. Try again later.
Articles 1–13