关注
Ching-Hsiang Chu
Ching-Hsiang Chu
Research Scientist, Meta/Facebook
在 meta.com 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Software-hardware co-design for fast and scalable training of deep learning recommendation models
D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ...
Proceedings of the 49th Annual International Symposium on Computer …, 2022
1132022
The MVAPICH project: Transforming research into high-performance MPI library for HPC community
DK Panda, H Subramoni, CH Chu, M Bayatpour
Journal of Computational Science 52, 101208, 2021
812021
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation
AA Awan, J Bédorf, CH Chu, H Subramoni, DK Panda
2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2019
612019
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL?
AA Awan, CH Chu, H Subramoni, DK Panda
Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018
602018
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems
CH Chu, P Kousha, AA Awan, KS Khorassani, H Subramoni, DK Panda
Proceedings of the 34th ACM International Conference on Supercomputing, 1-12, 2020
492020
M. khorashadi, P
D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ...
Bhattacharya, P. Lapukhov, M. Naumov, L. Qiao, M. Smelyanskiy, B. Jia, and V …, 2021
482021
Oc-dnn: Exploiting advanced unified memory capabilities in cuda 9 and volta gpus for out-of-core dnn training
AA Awan, CH Chu, H Subramoni, X Lu, DK Panda
2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018
412018
Designing high-performance mpi libraries with on-the-fly compression for modern gpu clusters
Q Zhou, C Chu, NS Kumar, P Kousha, SM Ghazimirsaeed, H Subramoni, ...
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021
362021
High-performance, distributed training of large-scale deep learning recommendation models
D Mudigere, Y Hao, J Huang, A Tulloch, S Sridharan, X Liu, M Ozdal, ...
arXiv preprint arXiv:2104.05158, 2021
342021
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
302015
Cuda kernel based collective reduction operations on large-scale gpu clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
292016
Performance evaluation of MPI libraries on GPU-enabled OpenPOWER architectures: Early experiences
KS Khorassani, CH Chu, H Subramoni, DK Panda
High Performance Computing: ISC High Performance 2019 International …, 2019
282019
Improving SCTP performance by jitter-based congestion control over wired-wireless networks
JM Chen, CH Chu, EHK Wu, MF Tsai, JR Wang
EURASIP Journal on Wireless Communications and Networking 2011, 1-13, 2011
282011
Characterizing cuda unified memory (um)-aware mpi designs on modern gpu architectures
KV Manian, AA Ammar, A Ruhela, CH Chu, H Subramoni, DK Panda
Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 43-52, 2019
252019
Efficient and scalable multi-source streaming broadcast on GPU clusters for deep learning
CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda
2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017
252017
Designing a profiling and visualization tool for scalable and in-depth analysis of high-performance GPU clusters
P Kousha, B Ramesh, KK Suresh, CH Chu, A Jain, N Sarkauskas, ...
2019 IEEE 26th International Conference on High Performance Computing, Data …, 2019
242019
Communication profiling and characterization of deep-learning workloads on clusters with high-performance interconnects
AA Awan, A Jain, CH Chu, H Subramoni, DK Panda
IEEE Micro 40 (1), 35-43, 2019
222019
Designing a ROCm-aware MPI library for AMD GPUs: early experiences
K Shafie Khorassani, J Hashmi, CH Chu, CC Chen, H Subramoni, ...
International Conference on High Performance Computing, 118-136, 2021
212021
Optimized large-message broadcast for deep learning workloads: MPI, MPI+ NCCL, or NCCL2?
AA Awan, KV Manian, CH Chu, H Subramoni, DK Panda
parallel computing 85, 141-152, 2019
202019
Better Together: Jointly Optimizing {ML} Collective Scheduling and Execution Planning using {SYNDICATE}
K Mahajan, CH Chu, S Sridharan, A Akella
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
182023
系统目前无法执行此操作,请稍后再试。
文章 1–20