Follow
Yinmin Zhong
Title
Cited by
Cited by
Year
{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving
Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X Jin, Y Huang, Z Chen, H Zhang, ...
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023
802023
Fast distributed inference serving for large language models
B Wu, Y Zhong, Z Zhang, G Huang, X Liu, X Jin
arXiv preprint arXiv:2305.05920, 2023
332023
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen, Z Zhang, Y Peng, X Li, C Xie, ...
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), 2024
262024
Distserve: Disaggregating prefill and decoding for goodput-optimized large language model serving
Y Zhong, S Liu, J Chen, J Hu, Y Zhu, X Liu, X Jin, H Zhang
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024
202024
ElasticFlow: An elastic serverless training platform for distributed deep learning
D Gu, Y Zhao, Y Zhong, Y Xiong, Z Han, P Cheng, F Yang, G Huang, X Jin, ...
Proceedings of the 28th ACM International Conference on Architectural …, 2023
142023
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin
arXiv preprint arXiv:2404.09526, 2024
62024
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads
X Jin, Z Bai, Z Zhang, Y Zhu, Y Zhong, X Liu
IEEE/ACM Transactions on Networking, 2024
22024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
L Chang, W Bao, Q Hou, C Jiang, N Zheng, Y Zhong, X Zhang, Z Song, ...
arXiv preprint arXiv:2406.06858, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–8