Follow
Bingyang Wu
Title
Cited by
Cited by
Year
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction
S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang
Proceedings of the 49th Annual International Symposium on Computer …, 2022
302022
Fast distributed inference serving for large language models
B Wu, Y Zhong, Z Zhang, G Huang, X Liu, X Jin
arXiv preprint arXiv:2305.05920, 2023
182023
A survey of resource-efficient llm and multimodal foundation models
M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu, Y Zhao, C Yang, S Wang, ...
arXiv preprint arXiv:2401.08092, 2024
132024
Transparent {GPU} sharing in container clouds for deep learning workloads
B Wu, Z Zhang, Z Bai, X Liu, X Jin
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
112023
Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training
S Zheng, R Chen, Y Jin, A Wei, B Wu, X Li, S Yan, Y Liang
IEEE Transactions on Parallel and Distributed Systems 33 (11), 3220-3232, 2021
92021
XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale
B Wu, K Qian, B Li, Y Ma, Q Zhang, Z Jiang, J Zhao, D Cai, E Zhai, X Liu, ...
Proceedings of the ACM SIGCOMM 2023 Conference, 696-709, 2023
12023
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin
arXiv preprint arXiv:2404.09526, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–7