Shigang Li
Postdoctoral Researcher, ETH Zurich, Department of Computer Science, SPCL
NUMA-aware shared-memory collective communication for MPI
S Li, T Hoefler, M Snir
Proceedings of the 22nd international symposium on High-performance parallel …, 2013
Parallel processing systems for big data: a survey
Y Zhang, T Cao, S Li, X Tian, L Yuan, H Jia, AV Vasilakos
Proceedings of the IEEE 104 (11), 2114-2136, 2016
Improved MPI collectives for MPI processes in shared address spaces
S Li, T Hoefler, C Hu, M Snir
Cluster Computing 17 (4), 1139-1155, 2014
Taming unbalanced training workloads in deep learning with partial collective operations
S Li, T Ben-Nun, SD Girolamo, D Alistarh, T Hoefler
Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of …, 2020
Deep learning for post-processing ensemble weather forecasts
P Grönquist, C Yao, T Ben-Nun, N Dryden, P Dueben, S Li, T Hoefler
Philosophical Transactions of the Royal Society A 379 (2194), 20200092, 2021
Asynchronous work stealing on distributed memory systems
S Li, J Hu, X Cheng, C Zhao
2013 21st Euromicro International Conference on Parallel, Distributed, and …, 2013
Cache-oblivious MPI all-to-all communications based on Morton order
S Li, Y Zhang, T Hoefler
IEEE Transactions on Parallel and Distributed Systems, 2018
Kernel optimization for short-range molecular dynamics
C Hu, X Wang, J Li, X He, S Li, Y Feng, S Yang, H Bai
Computer Physics Communications, 2016
Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation
B Wu, S Li, Y Zhang, N Nie
Computer Physics Communications, 2016
Fast Convolution Operations on Many-Core Architectures
S Li, Y Zhang, C Xiang, L Shi
High Performance Computing and Communications (HPCC), 2015 IEEE 7th …, 2015
A Cross-Platform SpMV Framework on Many-Core Architectures
Y Zhang, S Li, S Yan, H Zhou
ACM Transactions on Architecture and Code Optimization (TACO) 13 (4), 33, 2016
Efficient parallel optimizations of a high-performance SIFT on GPUs
Z Li, H Jia, Y Zhang, S Liu, S Li, X Wang, H Zhang
Journal of Parallel and Distributed Computing, 2018
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer
S Li, B Wu, Y Zhang, X Wang, J Li, C Hu, J Wang, Y Feng, N Nie
Proceedings of the 47th International Conference on Parallel Processing, 47, 2018
Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations
X Zhu, J Zhang, K Yoshii, S Li, Y Zhang, P Balaji
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International …, 2015
CASESM 2: Description and climate simulation performance of the Chinese Academy of Sciences (CAS) Earth System Model (ESM) version 2
H Zhang, M Zhang, J Jin, K Fei, D Ji, C Wu, J Zhu, J He, Z Chai, J Xie, ...
Journal of Advances in Modeling Earth Systems, e2020MS002210, 2020
OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight
K Li, H Shang, Y Zhang, S Li, B Wu, D Wang, L Zhang, F Li, D Chen, ...
Proceedings of the International Conference for High Performance Computing …, 2019
Predicting Weather Uncertainty with Deep Convnets
P Grönquist, T Ben-Nun, N Dryden, P Dueben, L Lavarini, S Li, T Hoefler
arXiv preprint arXiv:1911.00630, 2019
Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model
J Xiao, S Li, B Wu, H Zhang, K Li, E Yao, Y Zhang, G Tan
Proceedings of the 47th International Conference on Parallel Processing, 12, 2018
POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures
S Li, Y Zhang, T Hoefler
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017
Extending synchronization constructs in openMP to exploit pipeline parallelism on heterogeneous multi-core
S Li, S Yao, H He, L Sun, Y Chen, Y Peng
International Conference on Algorithms and Architectures for Parallel …, 2011
