Hawq-v2: Hessian aware trace-weighted quantization of neural networks Z Dong, Z Yao, D Arfeen, A Gholami, MW Mahoney, K Keutzer Advances in Neural Information Processing Systems 33, 2020 | 237 | 2020 |
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, RYY Wong, Z Chen, ... arXiv preprint arXiv:2305.09781, 2023 | 50 | 2023 |
Large batch size training of neural networks with adversarial training and second-order information Z Yao, A Gholami, D Arfeen, R Liaw, J Gonzalez, K Keutzer, M Mahoney arXiv preprint arXiv:1810.01021, 2018 | 50 | 2018 |
Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling S Jayaram Subramanya, D Arfeen, S Lin, A Qiao, Z Jia, GR Ganger Proceedings of the 29th Symposium on Operating Systems Principles, 642-657, 2023 | 10 | 2023 |