Coordinated static and dynamic cache bypassing for GPUs X Xie, Y Liang, Y Wang, G Sun, T Wang 2015 IEEE 21st International Symposium on High Performance Computer …, 2015 | 167 | 2015 |
An efficient compiler framework for cache bypassing on GPUs X Xie, Y Liang, G Sun, D Chen 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 516-523, 2013 | 128 | 2013 |
Enabling Coordinated Register Allocation and Thread-level Parallelism Optimization for GPUs X Xie, Y Liang, X Li, Y Wu, S Guangyu, T Wang, D Fan IEEE/ACM International Symposium on Microarchitecture,, 2015 | 83 | 2015 |
CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs X Xie, W Tan, LL Fong, Y Liang Proceedings of the 26th International Symposium on High-Performance Parallel …, 2017 | 68* | 2017 |
An Efficient Compiler Framework for Cache Bypassing on GPUs Y Liang, X Xie, G Sun, D Chen IEEE, 2015 | 27 | 2015 |
Operon: An encrypted database for ownership-preserving data management S Wang, Y Li, H Li, F Li, C Tian, L Su, Y Zhang, Y Ma, L Yan, Y Sun, ... Proceedings of the VLDB Endowment 15 (12), 3332-3345, 2022 | 21 | 2022 |
Performance-centric register file design for GPUs using racetrack memory S Wang, Y Liang, C Zhang, X Xie, G Sun, Y Liu, Y Wang, X Li 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 25-30, 2016 | 20 | 2016 |
CRAT: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan IEEE Transactions on Computers 67 (6), 890-897, 2017 | 17 | 2017 |
CuLDA: solving large-scale LDA Problems on GPUs X Xie, Y Liang, X Li, W Tan Proceedings of the 28th International Symposium on High-Performance Parallel …, 2019 | 15* | 2019 |
Exploring cache bypassing and partitioning for multi-tasking on GPUs Y Liang, X Li, X Xie 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 9-16, 2017 | 12 | 2017 |
Optimizing cache bypassing and warp scheduling for GPUs Y Liang, X Xie, Y Wang, G Sun, T Wang IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2017 | 12 | 2017 |
Accelerating multi-way joins on the GPU Z Lai, X Sun, Q Luo, X Xie The VLDB Journal, 1-25, 2022 | 10 | 2022 |
Adaptive parallelism of task execution on machines with accelerators LL Fong, W Tan, X Xie, H Zhou US Patent 10,203,988, 2019 | 4 | 2019 |
Efficient data-parallel primitives on heterogeneous systems Z Lai, Q Luo, X Xie Proceedings of the 48th International Conference on Parallel Processing, 1-10, 2019 | 2 | 2019 |
Matrix factorization with two-stage data block dispatch associated with graphics processing units E Duesterwald, LL Fong, W Tan, X Xie US Patent 11,487,847, 2022 | | 2022 |
Matrix factorization with two-stage data block dispatch associated with graphics processing units E Duesterwald, LL Fong, W Tan, X Xie US Patent 10,380,222, 2019 | | 2019 |