Wei Wu

Cited by

	All	Since 2019
Citations	836	663
h-index	15	12
i10-index	15	12

180

135

2010201120122013201420152016201720182019202020212022202320244 2 3 3 4 18 26 48 57 80 112 134 131 164 42

Public access

View all

15 articles

0 articles

available

not available

Based on funding mandates

Co-authors

George BosilcaResearch Assistant Professor, University of Tennessee, KnoxvilleVerified email at icl.utk.edu
Linnan WangBrown UniversityVerified email at brown.edu
Zenglin XuHarbin Institute of Technology, Shenzhen & Peng Cheng LabVerified email at hit.edu.cn
Jack DongarraUniversity of Tennessee; Oak Ridge National Laboratory; University of ManchesterVerified email at icl.utk.edu
Aurelien BouteillerUniversity of Tennessee, KnoxvilleVerified email at icl.utk.edu
Yang Xu (徐阳)Associate Professor, Hong Kong Polytechnic UniversityVerified email at polyu.edu.hk
Mathieu FavergeBordeaux INP - ENSEIRB-MatMecaVerified email at inria.fr
Dali WangORNLVerified email at ornl.gov

Wei Wu

NVIDIA

Verified email at nvidia.com - Homepage

High Performance Computing Distributed Computing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song, Z Xu, T Kraska Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of …, 2018	259	2018
Hierarchical dag scheduling for hybrid distributed systems W Wu, A Bouteiller, G Bosilca, M Faverge, J Dongarra 2015 IEEE International Parallel and Distributed Processing Symposium, 156-165, 2015	90	2015
Blasx: A high performance level-3 blas library for heterogeneous multi-gpu computing L Wang, W Wu, Z Xu, J Xiao, Y Yang Proceedings of the 2016 International Conference on Supercomputing, 1-11, 2016	76	2016
Unity: Accelerating {DNN} training through joint optimization of algebraic transformations and parallelization C Unger, Z Jia, W Wu, S Lin, M Baines, CEQ Narvaez, V Ramakrishnaiah, ... 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022	58	2022
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance E Slaughter, W Wu, Y Fu, L Brandenburg, N Garcia, W Kautz, E Marx, ... SC '20: Proceedings of the International Conference for High Performance …, 2020	51	2020
O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference T Geng, A Li, T Wang, C Wu, Y Li, R Shi, W Wu, MC Herbordt IEEE Transactions on Parallel & Distributed Systems, 1-1, 2020	34	2020
Implementing directed acyclic graphs with the heterogeneous system architecture S Puthoor, AM Aji, S Che, M Daga, W Wu, BM Beckmann, G Rodgers Proceedings of the 9th Annual Workshop on General Purpose Processing using …, 2016	29	2016
GPU-aware non-contiguous data movement in Open MPI W Wu, G Bosilca, R vandeVaart, S Jeaugey, J Dongarra International Symposium on High-Performance Parallel and Distributed …, 2016	29*	2016
FFT-based gradient sparsification for the distributed training of deep neural networks L Wang, W Wu, J Zhang, H Liu, G Bosilca, M Herlihy, R Fonseca Proceedings of the 29th International Symposium on High-Performance Parallel …, 2020	28	2020
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning T Geng, T Wang, C Wu, C Yang, W Wu, A Li, MC Herbordt Proceeding ICS '19 Proceedings of the ACM International Conference on …, 2019	28	2019
Virtual reality based robotics learning system X Yang, Y Zhao, W Wu, H Wang 2008 IEEE International Conference on Automation and Logistics, 859-864, 2008	26	2008
Adapt: An event-based adaptive collective communication framework X Luo, W Wu, G Bosilca, T Patinyasakdikul, L Wang, J Dongarra Proceedings of the 27th International Symposium on High-Performance Parallel …, 2018	25	2018
A scientific function test framework for modular environmental model development: application to the community land model D Wang, T Janjusic, C Iversen, P Thornton, M Karssovski, W Wu, Y Xu 2015 IEEE/ACM 1st International Workshop on Software Engineering for High …, 2015	21	2015
HAN: A hierarchical autotuned collective communication framework X Luo, W Wu, G Bosilca, Y Pei, Q Cao, T Patinyasakdikul, D Zhong, ... 2020 IEEE International Conference on Cluster Computing (CLUSTER), 23-34, 2020	19	2020
Efficient Communications in Training Large Scale Neural Networks Y Zhao, L Wang, W Wu, G Bosilca, R Vuduc, J Ye, W Tang, Z Xu Thematic Workshops '17 Proceedings of the on Thematic Workshops of ACM …, 2017	15*	2017
A web-based visual analytic framework for understanding large-scale environmental models: A use case for the community land model Y Xu, D Wang, T Janjusic, W Wu, Y Pei, Z Yao Procedia Computer Science 108, 1731-1740, 2017	8	2017
Flexible data redistribution in a task-based runtime system Q Cao, G Bosilca, W Wu, D Zhong, A Bouteiller, J Dongarra 2020 IEEE International Conference on Cluster Computing (CLUSTER), 221-225, 2020	7	2020
Algorithms for modeling structural changes in human chromosomes X Yang, W Wu, CC Tseng Computer methods and programs in biomedicine 110 (2), 171-182, 2013	7	2013
Compiler technologies for understanding legacy scientific code: A case study on an ACME land module D Wang, Y Pei, O Hernandez, W Wu, Z Yao, Y Kim, M Wolfe, R Kitchen Procedia Computer Science 108, 2418-2422, 2017	5	2017
Multifractal and singularity analysis of weighted road networks M Dai, C Zhang, L Li, W Wu International Journal of Modern Physics B 28 (30), 1450215, 2014	5	2014

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors