SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song, Z Xu, T Kraska Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of …, 2018 | 285 | 2018 |
Hierarchical dag scheduling for hybrid distributed systems W Wu, A Bouteiller, G Bosilca, M Faverge, J Dongarra 2015 IEEE International Parallel and Distributed Processing Symposium, 156-165, 2015 | 92 | 2015 |
Unity: Accelerating {DNN} training through joint optimization of algebraic transformations and parallelization C Unger, Z Jia, W Wu, S Lin, M Baines, CEQ Narvaez, V Ramakrishnaiah, ... 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022 | 83 | 2022 |
Blasx: A high performance level-3 blas library for heterogeneous multi-gpu computing L Wang, W Wu, Z Xu, J Xiao, Y Yang Proceedings of the 2016 International Conference on Supercomputing, 1-11, 2016 | 77 | 2016 |
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance E Slaughter, W Wu, Y Fu, L Brandenburg, N Garcia, W Kautz, E Marx, ... SC '20: Proceedings of the International Conference for High Performance …, 2020 | 62 | 2020 |
O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference T Geng, A Li, T Wang, C Wu, Y Li, R Shi, W Wu, MC Herbordt IEEE Transactions on Parallel & Distributed Systems, 1-1, 2020 | 40 | 2020 |
GPU-aware non-contiguous data movement in Open MPI W Wu, G Bosilca, R vandeVaart, S Jeaugey, J Dongarra International Symposium on High-Performance Parallel and Distributed …, 2016 | 31* | 2016 |
FFT-based gradient sparsification for the distributed training of deep neural networks L Wang, W Wu, J Zhang, H Liu, G Bosilca, M Herlihy, R Fonseca Proceedings of the 29th International Symposium on High-Performance Parallel …, 2020 | 29 | 2020 |
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning T Geng, T Wang, C Wu, C Yang, W Wu, A Li, MC Herbordt Proceeding ICS '19 Proceedings of the ACM International Conference on …, 2019 | 29 | 2019 |
Implementing directed acyclic graphs with the heterogeneous system architecture S Puthoor, AM Aji, S Che, M Daga, W Wu, BM Beckmann, G Rodgers Proceedings of the 9th Annual Workshop on General Purpose Processing using …, 2016 | 29 | 2016 |
Adapt: An event-based adaptive collective communication framework X Luo, W Wu, G Bosilca, T Patinyasakdikul, L Wang, J Dongarra Proceedings of the 27th International Symposium on High-Performance Parallel …, 2018 | 28 | 2018 |
Virtual reality based robotics learning system X Yang, Y Zhao, W Wu, H Wang 2008 IEEE International Conference on Automation and Logistics, 859-864, 2008 | 27 | 2008 |
HAN: A hierarchical autotuned collective communication framework X Luo, W Wu, G Bosilca, Y Pei, Q Cao, T Patinyasakdikul, D Zhong, ... 2020 IEEE International Conference on Cluster Computing (CLUSTER), 23-34, 2020 | 25 | 2020 |
A scientific function test framework for modular environmental model development: application to the community land model D Wang, T Janjusic, C Iversen, P Thornton, M Karssovski, W Wu, Y Xu 2015 IEEE/ACM 1st International Workshop on Software Engineering for High …, 2015 | 21 | 2015 |
Efficient Communications in Training Large Scale Neural Networks Y Zhao, L Wang, W Wu, G Bosilca, R Vuduc, J Ye, W Tang, Z Xu Thematic Workshops '17 Proceedings of the on Thematic Workshops of ACM …, 2017 | 15* | 2017 |
A web-based visual analytic framework for understanding large-scale environmental models: A use case for the community land model Y Xu, D Wang, T Janjusic, W Wu, Y Pei, Z Yao Procedia Computer Science 108, 1731-1740, 2017 | 8 | 2017 |
Flexible data redistribution in a task-based runtime system Q Cao, G Bosilca, W Wu, D Zhong, A Bouteiller, J Dongarra 2020 IEEE International Conference on Cluster Computing (CLUSTER), 221-225, 2020 | 7 | 2020 |
Algorithms for modeling structural changes in human chromosomes X Yang, W Wu, CC Tseng Computer methods and programs in biomedicine 110 (2), 171-182, 2013 | 7 | 2013 |
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks.(2018) L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song, Z Xu, T Kraska | 5 | 2018 |
Compiler technologies for understanding legacy scientific code: A case study on an ACME land module D Wang, Y Pei, O Hernandez, W Wu, Z Yao, Y Kim, M Wolfe, R Kitchen Procedia Computer Science 108, 2418-2422, 2017 | 5 | 2017 |