Yinmin Zhong
Cited by
Cited by
{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving
Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X Jin, Y Huang, Z Chen, H Zhang, ...
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI íK, 2023
Fast distributed inference serving for large language models
B Wu, Y Zhong, Z Zhang, G Huang, X Liu, X Jin
arXiv preprint arXiv:2305.05920, 2023
ElasticFlow: An elastic serverless training platform for distributed deep learning
D Gu, Y Zhao, Y Zhong, Y Xiong, Z Han, P Cheng, F Yang, G Huang, X Jin, ...
Proceedings of the 28th ACM International Conference on Architectural íK, 2023
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Y Zhong, S Liu, J Chen, J Hu, Y Zhu, X Liu, X Jin, H Zhang
arXiv preprint arXiv:2401.09670, 2024
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin
arXiv preprint arXiv:2404.09526, 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen, Z Zhang, Y Peng, X Li, C Xie, ...
arXiv preprint arXiv:2402.15627, 2024
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads
X Jin, Z Bai, Z Zhang, Y Zhu, Y Zhong, X Liu
IEEE/ACM Transactions on Networking, 2024
The system can't perform the operation now. Try again later.
Articles 1–7