Large batch optimization for deep learning: Training bert in 76 minutes Y You, J Li, S Reddi, J Hseu, S Kumar, S Bhojanapalli, X Song, J Demmel, ... arXiv preprint arXiv:1904.00962, 2019 | 1022 | 2019 |
Large batch training of convolutional networks Y You, I Gitman, B Ginsburg arXiv preprint arXiv:1708.03888, 2017 | 857 | 2017 |
Imagenet training in minutes Y You, Z Zhang, CJ Hsieh, J Demmel, K Keutzer Proceedings of the 47th international conference on parallel processing, 1-10, 2018 | 496 | 2018 |
Scaling sgd batch size to 32k for imagenet training Y You, I Gitman, B Ginsburg arXiv preprint arXiv:1708.03888 6 (12), 6, 2017 | 415 | 2017 |
Cafe: Learning to condense dataset by aligning features K Wang, B Zhao, X Peng, Z Zhu, S Yang, S Wang, G Huang, H Bilen, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 200 | 2022 |
Towards efficient and scalable sharpness-aware minimization Y Liu, S Mai, X Chen, CJ Hsieh, Y You Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 124 | 2022 |
Crafting better contrastive views for siamese representation learning X Peng, K Wang, Z Zhu, M Wang, Y You Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 115 | 2022 |
Reducing BERT pre-training time from 3 days to 76 minutes Y You, J Li, J Hseu, X Song, J Demmel, CJ Hsieh arXiv preprint arXiv:1904.00962 12, 2, 2019 | 112 | 2019 |
Colossal-ai: A unified deep learning system for large-scale parallel training S Li, H Liu, Z Bian, J Fang, H Huang, Y Liu, B Wang, Y You Proceedings of the 52nd International Conference on Parallel Processing, 766-775, 2023 | 105 | 2023 |
Large-batch training for LSTM and beyond Y You, J Hseu, C Ying, J Demmel, K Keutzer, CJ Hsieh Proceedings of the International Conference for High Performance Computing …, 2019 | 104 | 2019 |
Scaling deep learning on GPU and knights landing clusters Y You, A Buluç, J Demmel Proceedings of the International Conference for High Performance Computing …, 2017 | 98 | 2017 |
100-epoch imagenet training with alexnet in 24 minutes Y You, Z Zhang, C Hsieh, J Demmel, K Keutzer arXiv preprint arXiv:1709.05011 8, 2017 | 78 | 2017 |
Go wider instead of deeper F Xue, Z Shi, F Wei, Y Lou, Y Liu, Y You Proceedings of the AAAI Conference on Artificial Intelligence 36 (8), 8779-8787, 2022 | 73 | 2022 |
Fast deep neural network training on distributed systems and cloud TPUs Y You, Z Zhang, CJ Hsieh, J Demmel, K Keutzer IEEE Transactions on Parallel and Distributed Systems 30 (11), 2449-2462, 2019 | 64 | 2019 |
Dream: Efficient dataset distillation by representative matching Y Liu, J Gu, K Wang, Z Zhu, W Jiang, Y You Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 58 | 2023 |
Asynchronous parallel greedy coordinate descent Y You, X Lian, J Liu, HF Yu, IS Dhillon, J Demmel, CJ Hsieh Advances in Neural Information Processing Systems, 4682-4690, 2016 | 53 | 2016 |
Mic-svm: Designing a highly efficient support vector machine for advanced modern multi-core and many-core architectures Y You, SL Song, H Fu, A Marquez, MM Dehnavi, K Barker, KW Cameron, ... 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014 | 52 | 2014 |
Preventing zero-shot transfer degradation in continual learning of vision-language models Z Zheng, M Ma, K Wang, Z Qin, X Yue, Y You Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 51 | 2023 |
Prompt vision transformer for domain generalization Z Zheng, X Yue, K Wang, Y You arXiv preprint arXiv:2208.08914, 2022 | 50 | 2022 |
To repeat or not to repeat: Insights from scaling llm under token-crisis F Xue, Y Fu, W Zhou, Z Zheng, Y You Advances in Neural Information Processing Systems 36, 2024 | 46 | 2024 |