A gpgpu compiler for memory optimization and parallelism management
Y Yang, P Xiang, J Kong, H Zhou
ACM SIGPLAN Notices 45 (6), 86-97, 2010
CPU-Assisted GPGPU on Fused CPU-GPU Architectures
Y Yang, P Xiang, M Mantor, H Zhou
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Y Yang, H Zhou
ACM SIGPLAN Notices 49 (8), 93-106, 2014
Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput
Y Yang, P Xiang, M Mantor, N Rubin, H Zhou
Proceedings of the 21st international conference on Parallel architectures …, 2012
Warp-level divergence in GPUs: Characterization, impact, and mitigation
P Xiang, Y Yang, H Zhou
2014 IEEE 20th International Symposium on High Performance Computer …, 2014
Optimizing memory efficiency for deep convolutional neural networks on GPUs
C Li, Y Yang, M Feng, S Chakradhar, H Zhou
SC'16: Proceedings of the International Conference for High Performance …, 2016
Accelerating MATLAB image processing toolbox functions on GPUs
J Kong, M Dimitrov, Y Yang, J Liyanage, L Cao, J Staples, M Mantor, ...
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics …, 2010
Locality principle revisited: A probability-based quantitative approach
S Gupta, P Xiang, Y Yang, H Zhou
Journal of Parallel and Distributed Computing, 2013
Locality Principle Revisited: A Probability-Based Quantitative Approach
S Gupta, P Xiang, Y Yang, H Zhou
IEEE International Parallel & Distributed Processing Symposium, 995 - 1009, 2012
Accelerating deep neural network training with inconsistent stochastic gradient descent
L Wang, Y Yang, R Min, S Chakradhar
Neural Networks 93, 219-229, 2017
Blasx: A high performance level-3 blas library for heterogeneous multi-gpu computing
L Wang, W Wu, Z Xu, J Xiao, Y Yang
Proceedings of the 2016 International Conference on Supercomputing, 1-11, 2016
A unified optimizing compiler framework for different GPGPU architectures
Y Yang, P Xiang, J Kong, M Mantor, H Zhou
ACM Transactions on Architecture and Code Optimization (TACO) 9 (2), 1-33, 2012
Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement
P Xiang, Y Yang, M Mantor, N Rubin, LR Hsu, H Zhou, M Mantor, N Rubin
ICS, 433-442, 2013
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs
C Li, Y Yang, H Dai, S Yan, F Mueller, H Zhou
2014 IEEE International Symposium on Performance Analysis of Systems and …, 2014
Automatic data placement into GPU on-chip memory resources
C Li, Y Yang, Z Lin, H Zhou
2015 IEEE/ACM International Symposium on Code Generation and Optimization …, 2015
Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs
Y Yang, P Xiang, M Mantor, H Zhou
International Conference on Parallel Processing, 2012
Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors
N Ravi, Y Yang, T Bao, S Chakradhar
Proceedings of the 26th ACM international conference on Supercomputing, 47-58, 2012
A case for a flexible scalar unit in SIMT architecture
Y Yang, P Xiang, M Mantor, N Rubin, L Hsu, Q Dong, H Zhou
2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014
An optimizing compiler for GPGPU programs with input-data sharing
Y Yang, P Xiang, J Kong, H Zhou
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of …, 2010
Comp: Compiler optimizations for manycore processors
L Song, M Feng, N Ravi, Y Yang, S Chakradhar
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 659-671, 2014
