Xulong Tang
Cited by
Cited by
Scheduling techniques for GPU architectures with processing-in-memory capabilities
A Pattnaik, X Tang, A Jog, O Kayiran, AK Mishra, MT Kandemir, O Mutlu, ...
Proceedings of the 2016 International Conference on Parallel Architectures …, 2016
Yolobile: Real-time object detection on mobile devices via compression-compilation co-design
Y Cai, H Li, G Yuan, W Niu, Y Li, X Tang, B Ren, Y Wang
Proceedings of the AAAI conference on artificial intelligence 35 (2), 955-963, 2021
Controlled kernel launch for dynamic parallelism in GPUs
X Tang, A Pattnaik, H Jiang, O Kayiran, A Jog, S Pai, M Ibrahim, ...
2017 IEEE International Symposium on High Performance Computer Architecture …, 2017
Data movement aware computation partitioning
X Tang, O Kislal, M Kandemir, M Karakoy
Proceedings of the 50th Annual IEEE/ACM International Symposium on …, 2017
Opportunistic computing in gpu architectures
A Pattnaik, X Tang, O Kayiran, A Jog, A Mishra, MT Kandemir, ...
Proceedings of the 46th international symposium on computer architecture …, 2019
Improving bank-level parallelism for irregular applications
X Tang, M Kandemir, P Yedlapalli, J Kotra
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016
μC-States: Fine-grained GPU datapath power management
O Kayiran, A Jog, A Pattnaik, R Ausavarungnirun, X Tang, MT Kandemir, ...
Proceedings of the 2016 International Conference on Parallel Architectures …, 2016
Memory row reuse distance and its role in optimizing application performance
M Kandemir, H Zhao, X Tang, M Karakoy
Proceedings of the 2015 ACM SIGMETRICS International Conference on …, 2015
Automated runtime-aware scheduling for multi-tenant dnn inference on gpu
F Yu, S Bray, D Wang, L Shangguan, X Tang, C Liu, X Chen
2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 1-9, 2021
Algorithm-hardware co-design of attention mechanism on fpga devices
X Zhang, Y Wu, P Zhou, X Tang, J Hu
ACM Transactions on Embedded Computing Systems (TECS) 20 (5s), 1-24, 2021
Optimizing off-chip accesses in multicores
W Ding, X Tang, M Kandemir, Y Zhang, E Kultursay
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language …, 2015
Oversubscribed command queues in GPUs
S Puthoor, X Tang, J Gross, BM Beckmann
Proceedings of the 11th Workshop on General Purpose GPUs, 50-60, 2018
Enhancing computation-to-core assignment with physical location information
O Kislal, J Kotra, X Tang, MT Kandemir, M Jung
ACM SIGPLAN Notices 53 (4), 312-327, 2018
FlexBFS: a parallelism-aware implementation of breadth-first search on GPU
G Liu, H An, W Han, X Li, T Sun, W Zhou, X Wei, X Tang
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of …, 2012
DEMM: a Dynamic Energy-saving mechanism for Multicore Memories
A Sharifi, W Ding, D Guttman, H Zhao, X Tang, M Kandemir, C Das
2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation …, 2017
Enabling latency-aware data initialization for integrated CPU/GPU heterogeneous platform
Z Wang, Z Jiang, Z Wang, X Tang, C Liu, S Yin, Y Hu
IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2020
POSTER: Location-Aware Computation Mapping for Manycore Processors
O Kislal, J Kotra, X Tang, MT Kandemir, M Jung
2017 26th International Conference on Parallel Architectures and Compilation …, 2017
Parallelizing dnn training on gpus: Challenges and opportunities
W Xu, Y Zhang, X Tang
Companion Proceedings of the Web Conference 2021, 174-178, 2021
Enhancing address translations in throughput processors via compression
X Tang, Z Zhang, W Xu, MT Kandemir, R Melhem, J Yang
Proceedings of the ACM International Conference on Parallel Architectures …, 2020
Computing with near data
X Tang, M Taylan Kandemir, H Zhao, M Jung, M Karakoy
ACM SIGMETRICS Performance Evaluation Review 47 (1), 27-28, 2019
The system can't perform the operation now. Try again later.
Articles 1–20