InternLM2 Technical Report Z Cai, M Cao, H Chen, K Chen, K Chen, X Chen, X Chen, Z Chen, Z Chen, ... arXiv preprint arXiv:2403.17297, 2024 | 196 | 2024 |
Characterization and prediction of deep learning workloads in large-scale GPU datacenters Q Hu, P Sun, S Yan, Y Wen, T Zhang Proceedings of the International Conference for High Performance Computing …, 2021 | 135 | 2021 |
Gradientflow: Optimizing network performance for large-scale distributed dnn training P Sun, Y Wen, R Han, W Feng, S Yan IEEE Transactions on Big Data 8 (2), 495-507, 2019 | 114* | 2019 |
A chunk caching location and searching scheme in content centric networking Y Li, T Lin, H Tang, P Sun 2012 IEEE International Conference on Communications (ICC), 2655-2659, 2012 | 93 | 2012 |
Deep Learning Workload Scheduling in GPU Datacenters: A Survey Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo, T Zhang, Y Wen ACM Computing Surveys 56 (6), 1-38, 2024 | 53* | 2024 |
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach P Sun, Y Wen, NBD Ta, S Yan 2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1-6, 2017 | 50 | 2017 |
Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs W Gao, Z Ye, P Sun, Y Wen, T Zhang Proceedings of the ACM Symposium on Cloud Computing, 609-623, 2021 | 41 | 2021 |
Characterization of large language model development in the datacenter Q Hu, Z Ye, Z Wang, G Wang, M Zhang, Q Chen, P Sun, D Lin, X Wang, ... 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024 | 28 | 2024 |
Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs Q Hu, M Zhang, P Sun, Y Wen, T Zhang Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 25 | 2023 |
Timed dataflow: Reducing communication overhead for distributed machine learning systems P Sun, Y Wen, TNB Duong, S Yan 2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016 | 20 | 2016 |
Cloud3DView: An interactive tool for cloud data center operations J Yin, P Sun, Y Wen, H Gong, M Liu, X Li, H You, J Gao, C Lin Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, 499-500, 2013 | 19 | 2013 |
Elan: Towards Generic and Efficient Elastic Training for Deep Learning L Xie, J Zhai, B Wu, Y Wang, X Zhang, P Sun, S Yan 2020 IEEE 40th International Conference on Distributed Computing Systems …, 2020 | 18 | 2020 |
Loongserve: Efficiently serving long-context large language models with elastic sequence parallelism B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles …, 2024 | 17 | 2024 |
Astraea: A fair deep learning scheduler for multi-tenant gpu clusters Z Ye, P Sun, W Gao, T Zhang, X Wang, S Yan, Y Luo IEEE Transactions on Parallel and Distributed Systems 33 (11), 2781-2793, 2021 | 16 | 2021 |
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine P Sun, Y Wen, TNB Duong, X Xiao 2017 IEEE 23rd International Conference on Parallel and Distributed Systems …, 2017 | 15 | 2017 |
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning C Chen, X Li, Q Zhu, J Duan, P Sun, X Zhang, C Yang Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 13 | 2024 |
ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems Y Huang, H Zhang, Y Wen, P Sun, NBD TA arXiv preprint arXiv:2106.03122, 2021 | 12 | 2021 |
Graphh: High performance big graph analytics in small clusters P Sun, Y Wen, TNB Duong, X Xiao 2017 IEEE International Conference on Cluster Computing (CLUSTER), 256-266, 2017 | 12 | 2017 |
{dLoRA}: Dynamically Orchestrating Requests and Adapters for {LoRA}{LLM} Serving B Wu, R Zhu, Z Zhang, P Sun, X Liu, X Jin 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | 11 | 2024 |
Hydro:{Surrogate-Based} Hyperparameter Tuning Service in Datacenters Q Hu, Z Ye, M Zhang, Q Chen, P Sun, Y Wen, T Zhang 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 9 | 2023 |