Follow
Wei Wu
Title
Cited by
Cited by
Year
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks
L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song, Z Xu, T Kraska
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of …, 2018
2852018
Hierarchical dag scheduling for hybrid distributed systems
W Wu, A Bouteiller, G Bosilca, M Faverge, J Dongarra
2015 IEEE International Parallel and Distributed Processing Symposium, 156-165, 2015
922015
Unity: Accelerating {DNN} training through joint optimization of algebraic transformations and parallelization
C Unger, Z Jia, W Wu, S Lin, M Baines, CEQ Narvaez, V Ramakrishnaiah, ...
16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022
832022
Blasx: A high performance level-3 blas library for heterogeneous multi-gpu computing
L Wang, W Wu, Z Xu, J Xiao, Y Yang
Proceedings of the 2016 International Conference on Supercomputing, 1-11, 2016
772016
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance
E Slaughter, W Wu, Y Fu, L Brandenburg, N Garcia, W Kautz, E Marx, ...
SC '20: Proceedings of the International Conference for High Performance …, 2020
622020
O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference
T Geng, A Li, T Wang, C Wu, Y Li, R Shi, W Wu, MC Herbordt
IEEE Transactions on Parallel & Distributed Systems, 1-1, 2020
402020
GPU-aware non-contiguous data movement in Open MPI
W Wu, G Bosilca, R vandeVaart, S Jeaugey, J Dongarra
International Symposium on High-Performance Parallel and Distributed …, 2016
31*2016
FFT-based gradient sparsification for the distributed training of deep neural networks
L Wang, W Wu, J Zhang, H Liu, G Bosilca, M Herlihy, R Fonseca
Proceedings of the 29th International Symposium on High-Performance Parallel …, 2020
292020
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning
T Geng, T Wang, C Wu, C Yang, W Wu, A Li, MC Herbordt
Proceeding ICS '19 Proceedings of the ACM International Conference on …, 2019
292019
Implementing directed acyclic graphs with the heterogeneous system architecture
S Puthoor, AM Aji, S Che, M Daga, W Wu, BM Beckmann, G Rodgers
Proceedings of the 9th Annual Workshop on General Purpose Processing using …, 2016
292016
Adapt: An event-based adaptive collective communication framework
X Luo, W Wu, G Bosilca, T Patinyasakdikul, L Wang, J Dongarra
Proceedings of the 27th International Symposium on High-Performance Parallel …, 2018
282018
Virtual reality based robotics learning system
X Yang, Y Zhao, W Wu, H Wang
2008 IEEE International Conference on Automation and Logistics, 859-864, 2008
272008
HAN: A hierarchical autotuned collective communication framework
X Luo, W Wu, G Bosilca, Y Pei, Q Cao, T Patinyasakdikul, D Zhong, ...
2020 IEEE International Conference on Cluster Computing (CLUSTER), 23-34, 2020
252020
A scientific function test framework for modular environmental model development: application to the community land model
D Wang, T Janjusic, C Iversen, P Thornton, M Karssovski, W Wu, Y Xu
2015 IEEE/ACM 1st International Workshop on Software Engineering for High …, 2015
212015
Efficient Communications in Training Large Scale Neural Networks
Y Zhao, L Wang, W Wu, G Bosilca, R Vuduc, J Ye, W Tang, Z Xu
Thematic Workshops '17 Proceedings of the on Thematic Workshops of ACM …, 2017
15*2017
A web-based visual analytic framework for understanding large-scale environmental models: A use case for the community land model
Y Xu, D Wang, T Janjusic, W Wu, Y Pei, Z Yao
Procedia Computer Science 108, 1731-1740, 2017
82017
Flexible data redistribution in a task-based runtime system
Q Cao, G Bosilca, W Wu, D Zhong, A Bouteiller, J Dongarra
2020 IEEE International Conference on Cluster Computing (CLUSTER), 221-225, 2020
72020
Algorithms for modeling structural changes in human chromosomes
X Yang, W Wu, CC Tseng
Computer methods and programs in biomedicine 110 (2), 171-182, 2013
72013
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks.(2018)
L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song, Z Xu, T Kraska
52018
Compiler technologies for understanding legacy scientific code: A case study on an ACME land module
D Wang, Y Pei, O Hernandez, W Wu, Z Yao, Y Kim, M Wolfe, R Kitchen
Procedia Computer Science 108, 2418-2422, 2017
52017
The system can't perform the operation now. Try again later.
Articles 1–20