TASO: optimizing deep learning computation with automatic generation of graph substitutions Z Jia, O Padon, J Thomas, T Warszawski, M Zaharia, A Aiken Proceedings of the 27th ACM Symposium on Operating Systems Principles, 47-62, 2019 | 235 | 2019 |

Weld: A common runtime for high performance data analytics S Palkar, JJ Thomas, A Shanbhag, D Narayanan, H Pirk, M Schwarzkopf, ... Conference on Innovative Data Systems Research (CIDR) 19, 2017 | 172 | 2017 |

Evaluating end-to-end optimization for data analytics applications in weld S Palkar, J Thomas, D Narayanan, P Thaker, R Palamuttam, P Negi, ... Proceedings of the VLDB Endowment 11 (9), 1002-1015, 2018 | 93 | 2018 |

Optimizing dnn computation with relaxed graph substitutions Z Jia, J Thomas, T Warszawski, M Gao, M Zaharia, A Aiken SysML 2019, 2019 | 81 | 2019 |

Fleet: A framework for massively parallel streaming on FPGAs J Thomas, P Hanrahan, M Zaharia Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020 | 41 | 2020 |

Creating an agile hardware design flow R Bahr, C Barrett, N Bhagdikar, A Carsello, R Daly, C Donovick, D Durst, ... 2020 57th ACM/IEEE Design Automation Conference (DAC), 1-6, 2020 | 29 | 2020 |

Weld: Rethinking the interface between data-intensive applications S Palkar, J Thomas, D Narayanan, A Shanbhag, R Palamuttam, H Pirk, ... arXiv preprint arXiv:1709.06416, 2017 | 24 | 2017 |

Aha: An agile approach to the design of coarse-grained reconfigurable accelerators and compilers K Koul, J Melchert, K Sreedhar, L Truong, G Nyengele, K Zhang, Q Liu, ... ACM Transactions on Embedded Computing Systems 22 (2), 1-34, 2023 | 15 | 2023 |

Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra A Carsello, K Feng, T Kong, K Koul, Q Liu, J Melchert, G Nyengele, ... 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and …, 2022 | 14 | 2022 |

Laika: Efficient in-place scheduling for 3d mesh graph computations P Gruevski, W Hasenplaugh, D Lugato, JJ Thomas Proceedings of the 30th on Symposium on Parallelism in Algorithms and …, 2018 | 6 | 2018 |

Software-like Compilation for Data Center FPGA Accelerators J Thomas, C Lavin, A Kaviani Proceedings of the 11th International Symposium on Highly Efficient …, 2021 | 4 | 2021 |

Nested vector language: Roofline performance for data parallel code S Palkar, J Thomas, M Zaharia | 4 | 2016 |

mflowgen: A modular flow generator and ecosystem for community-driven physical design A Carsello, J Thomas, A Nayak, PH Chen, M Horowitz, P Raina, C Torng Proceedings of the 59th ACM/IEEE Design Automation Conference, 1339-1342, 2022 | 3 | 2022 |

Enabling Reusable Physical Design Flows with Modular Flow Generators A Carsello, J Thomas, A Nayak, PH Chen, M Horowitz, P Raina, C Torng arXiv preprint arXiv:2111.14535, 2021 | 2 | 2021 |

Amber: Coarse-Grained Reconfigurable Array-Based SoC for Dense Linear Algebra Acceleration K Feng, A Carsello, T Kong, K Koul, Q Liu, J Melchert, G Nyengele, ... 2022 IEEE Hot Chips 34 Symposium (HCS), 1-30, 2022 | 1 | 2022 |

Developing Fpgas as an Acceleration Platform for Data-Intensive Applications JJ Thomas Stanford University, 2022 | | 2022 |

Weld: fast data-parallel computation on modern hardware JJ Thomas Massachusetts Institute of Technology, 2016 | | 2016 |