A gpgpu compiler for memory optimization and parallelism management Y Yang, P Xiang, J Kong, H Zhou ACM SIGPLAN Notices 45 (6), 86-97, 2010 | 413 | 2010 |
Optimizing memory efficiency for deep convolutional neural networks on GPUs C Li, Y Yang, M Feng, S Chakradhar, H Zhou SC'16: Proceedings of the International Conference for High Performance …, 2016 | 139 | 2016 |
CPU-Assisted GPGPU on Fused CPU-GPU Architectures Y Yang, P Xiang, M Mantor, H Zhou | 134 | 2012 |
Accelerating deep neural network training with inconsistent stochastic gradient descent L Wang, Y Yang, R Min, S Chakradhar Neural Networks 93, 219-229, 2017 | 116 | 2017 |
Warp-level divergence in GPUs: Characterization, impact, and mitigation P Xiang, Y Yang, H Zhou 2014 IEEE 20th International Symposium on High Performance Computer …, 2014 | 92 | 2014 |
CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications Y Yang, H Zhou ACM SIGPLAN Notices 49 (8), 93-106, 2014 | 91 | 2014 |
Memory efficiency for convolutional neural networks operating on graphics processing units Y Yang, C Li, M Feng, S Chakradhar US Patent 10,489,703, 2019 | 78 | 2019 |
Blasx: A high performance level-3 blas library for heterogeneous multi-gpu computing L Wang, W Wu, Z Xu, J Xiao, Y Yang Proceedings of the 2016 International Conference on Supercomputing, 1-11, 2016 | 77 | 2016 |
Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput Y Yang, P Xiang, M Mantor, N Rubin, H Zhou Proceedings of the 21st international conference on Parallel architectures …, 2012 | 77* | 2012 |
Locality principle revisited: A probability-based quantitative approach S Gupta, P Xiang, Y Yang, H Zhou Journal of Parallel and Distributed Computing, 2013 | 58 | 2013 |
Locality Principle Revisited: A Probability-Based Quantitative Approach S Gupta, P Xiang, Y Yang, H Zhou IEEE International Parallel & Distributed Processing Symposium, 995 - 1009, 2012 | 58 | 2012 |
Accelerating MATLAB image processing toolbox functions on GPUs J Kong, M Dimitrov, Y Yang, J Liyanage, L Cao, J Staples, M Mantor, ... Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics …, 2010 | 57 | 2010 |
Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement P Xiang, Y Yang, M Mantor, N Rubin, LR Hsu, H Zhou, M Mantor, N Rubin ICS, 433-442, 2013 | 47 | 2013 |
Automatic data placement into GPU on-chip memory resources C Li, Y Yang, Z Lin, H Zhou 2015 IEEE/ACM International Symposium on Code Generation and Optimization …, 2015 | 42 | 2015 |
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs C Li, Y Yang, H Dai, S Yan, F Mueller, H Zhou 2014 IEEE International Symposium on Performance Analysis of Systems and …, 2014 | 42 | 2014 |
A unified optimizing compiler framework for different GPGPU architectures Y Yang, P Xiang, J Kong, M Mantor, H Zhou ACM Transactions on Architecture and Code Optimization (TACO) 9 (2), 1-33, 2012 | 39 | 2012 |
Tasks integrated networks: Joint detection and retrieval for image search L Zhang, Z He, Y Yang, L Wang, X Gao IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (1), 456-473, 2020 | 38 | 2020 |
Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs Y Yang, P Xiang, M Mantor, H Zhou International Conference on Parallel Processing, 2012 | 34 | 2012 |
Accelerating deep neural network training with inconsistent stochastic gradient descent W Linnan, Y Yang, R Min, S Chakradhar US Patent 10,572,800, 2020 | 26 | 2020 |
A case for a flexible scalar unit in SIMT architecture Y Yang, P Xiang, M Mantor, N Rubin, L Hsu, Q Dong, H Zhou 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014 | 23 | 2014 |