Follow
Sehoon Kim
Title
Cited by
Cited by
Year
A survey of quantization methods for efficient neural network inference
A Gholami, S Kim, Z Dong, Z Yao, MW Mahoney, K Keutzer
Low-Power Computer Vision, 291-326, 2022
8262022
I-BERT: Integer-only BERT quantization
S Kim, A Gholami, Z Yao, MW Mahoney, K Keutzer
International conference on machine learning, 5506-5518, 2021
2452021
Learned Token Pruning for Transformers
S Kim, S Shen, D Thorsley, A Gholami, W Kwon, J Hassoun, K Keutzer
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022
852022
AI and Memory Wall
A Gholami, Z Yao, S Kim, M Mahoney, K Keutzer
RiseLab Blog Post, https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8, 2021
612021
A Fast Post-Training Pruning Framework for Transformers
W Kwon, S Kim, MW Mahoney, J Hassoun, K Keutzer, A Gholami
Advances in Neural Information Processing Systems 35, 2022
602022
Squeezeformer: An efficient transformer for automatic speech recognition
S Kim, A Gholami, A Shaw, N Lee, K Mangalam, J Malik, MW Mahoney, ...
Advances in Neural Information Processing Systems 35, 2022
562022
SqueezeLLM: Dense-and-Sparse Quantization
S Kim, C Hooper, A Gholami, Z Dong, X Li, S Shen, MW Mahoney, ...
arXiv preprint arXiv:2306.07629, 2023
502023
Hessian-aware pruning and optimal neural implant
S Yu, Z Yao, A Gholami, Z Dong, S Kim, MW Mahoney, K Keutzer
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2022
492022
Applications and techniques for fast machine learning in science
AMC Deiana, N Tran, J Agar, M Blott, G Di Guglielmo, J Duarte, P Harris, ...
Frontiers in big Data 5, 787421, 2022
472022
Full Stack Optimization of Transformer Inference: a Survey
S Kim, C Hooper, T Wattanawong, M Kang, R Yan, H Genc, G Dinh, ...
arXiv preprint arXiv:2302.14017, 2023
332023
Speculative Decoding with Big Little Decoder
S Kim, K Mangalam, S Moon, J Malik, MW Mahoney, A Gholami, ...
Thirty-seventh Conference on Neural Information Processing Systems, 2023
28*2023
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition
S Kim, A Gholami, Z Yao, N Lee, P Wang, A Nrusimha, B Zhai, T Gao, ...
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
192022
WindTunnel: towards differentiable ML pipelines beyond a single model
GI Yu, S Amizadeh, S Kim, A Pagnoni, C Zhang, BG Chun, M Weimer, ...
Proceedings of the VLDB Endowment 15 (1), 11-20, 2021
12*2021
SPEED: Speculative Pipelined Execution for Efficient Decoding
C Hooper, S Kim, H Mohammadzadeh, H Genc, K Keutzer, A Gholami, ...
arXiv preprint arXiv:2310.12072, 2023
62023
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
T Kim, E Jeong, GW Kim, Y Koo, S Kim, G Yu, BG Chun
Advances in Neural Information Processing Systems 34, 1468-1480, 2021
52021
An LLM Compiler for Parallel Function Calling
S Kim, S Moon, R Tabrizi, N Lee, MW Mahoney, K Keutzer, A Gholami
arXiv preprint arXiv:2312.04511, 2023
42023
Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms
J Xu, S Kim, B Nikolic, YS Shao
2021 IEEE International Symposium on Performance Analysis of Systems and …, 2021
42021
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
N Lee, T Wattanawong, S Kim, K Mangalam, S Shen, G Anumanchipali, ...
arXiv preprint arXiv:2403.15042, 2024
2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
C Hooper, S Kim, H Mohammadzadeh, MW Mahoney, YS Shao, ...
arXiv preprint arXiv:2401.18079, 2024
2024
Learned Best-Effort LLM Serving
S Jha, C Hooper, X Liu, S Kim, K Keutzer
arXiv preprint arXiv:2401.07886, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20