TASO: optimizing deep learning computation with automatic generation of graph substitutions
Z Jia, O Padon, J Thomas, T Warszawski, M Zaharia, A Aiken
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 47-62, 2019
Weld: A common runtime for high performance data analytics
S Palkar, JJ Thomas, A Shanbhag, D Narayanan, H Pirk, M Schwarzkopf, ...
Conference on Innovative Data Systems Research (CIDR) 19, 2017
Evaluating end-to-end optimization for data analytics applications in weld
S Palkar, J Thomas, D Narayanan, P Thaker, R Palamuttam, P Negi, ...
Proceedings of the VLDB Endowment 11 (9), 1002-1015, 2018
Optimizing dnn computation with relaxed graph substitutions
Z Jia, J Thomas, T Warszawski, M Gao, M Zaharia, A Aiken
SysML 2019, 2019
Fleet: A framework for massively parallel streaming on FPGAs
J Thomas, P Hanrahan, M Zaharia
Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020
Creating an agile hardware design flow
R Bahr, C Barrett, N Bhagdikar, A Carsello, R Daly, C Donovick, D Durst, ...
2020 57th ACM/IEEE Design Automation Conference (DAC), 1-6, 2020
Weld: Rethinking the interface between data-intensive applications
S Palkar, J Thomas, D Narayanan, A Shanbhag, R Palamuttam, H Pirk, ...
arXiv preprint arXiv:1709.06416, 2017
Aha: An agile approach to the design of coarse-grained reconfigurable accelerators and compilers
K Koul, J Melchert, K Sreedhar, L Truong, G Nyengele, K Zhang, Q Liu, ...
ACM Transactions on Embedded Computing Systems 22 (2), 1-34, 2023
Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra
A Carsello, K Feng, T Kong, K Koul, Q Liu, J Melchert, G Nyengele, ...
2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and …, 2022
Laika: Efficient in-place scheduling for 3d mesh graph computations
P Gruevski, W Hasenplaugh, D Lugato, JJ Thomas
Proceedings of the 30th on Symposium on Parallelism in Algorithms and …, 2018
Software-like Compilation for Data Center FPGA Accelerators
J Thomas, C Lavin, A Kaviani
Proceedings of the 11th International Symposium on Highly Efficient …, 2021
Nested vector language: Roofline performance for data parallel code
S Palkar, J Thomas, M Zaharia
mflowgen: A modular flow generator and ecosystem for community-driven physical design
A Carsello, J Thomas, A Nayak, PH Chen, M Horowitz, P Raina, C Torng
Proceedings of the 59th ACM/IEEE Design Automation Conference, 1339-1342, 2022
Enabling Reusable Physical Design Flows with Modular Flow Generators
A Carsello, J Thomas, A Nayak, PH Chen, M Horowitz, P Raina, C Torng
arXiv preprint arXiv:2111.14535, 2021
Amber: Coarse-Grained Reconfigurable Array-Based SoC for Dense Linear Algebra Acceleration
K Feng, A Carsello, T Kong, K Koul, Q Liu, J Melchert, G Nyengele, ...
2022 IEEE Hot Chips 34 Symposium (HCS), 1-30, 2022
Developing Fpgas as an Acceleration Platform for Data-Intensive Applications
JJ Thomas
Stanford University, 2022
Weld: fast data-parallel computation on modern hardware
JJ Thomas
Massachusetts Institute of Technology, 2016
