Galvatron: Efficient transformer training over multiple gpus using automatic parallelism X Miao, Y Wang, Y Jiang, C Shi, X Nie, H Zhang, B Cui arXiv preprint arXiv:2211.13878, 2022 | 31 | 2022 |
Spotserve: Serving generative large language models on preemptible instances X Miao, C Shi, J Duan, X Xi, D Lin, B Cui, Z Jia Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 14 | 2024 |
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge B Xiao, C Shi, X Nie, F Yang, X Deng, L Su, W Chen, B Cui arXiv preprint arXiv:2405.00263, 2024 | | 2024 |
SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | | 2024 |