Alert button
Picture for Amir Gholami

Amir Gholami

Alert button

AI and Memory Wall

Mar 21, 2024
Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, Kurt Keutzer

Viaarxiv icon

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Feb 07, 2024
Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

Viaarxiv icon

An LLM Compiler for Parallel Function Calling

Dec 07, 2023
Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

Viaarxiv icon

SPEED: Speculative Pipelined Execution for Efficient Decoding

Oct 18, 2023
Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao

Viaarxiv icon

SqueezeLLM: Dense-and-Sparse Quantization

Jun 13, 2023
Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer

Figure 1 for SqueezeLLM: Dense-and-Sparse Quantization
Figure 2 for SqueezeLLM: Dense-and-Sparse Quantization
Figure 3 for SqueezeLLM: Dense-and-Sparse Quantization
Figure 4 for SqueezeLLM: Dense-and-Sparse Quantization
Viaarxiv icon

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

Jun 01, 2023
Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael Mahoney, Amir Gholami

Figure 1 for Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
Figure 2 for Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
Figure 3 for Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
Figure 4 for Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
Viaarxiv icon

End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs

Apr 13, 2023
Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael W. Mahoney, Jovan Mitrevski, Nhan Tran

Figure 1 for End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Figure 2 for End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Figure 3 for End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Figure 4 for End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Viaarxiv icon

Full Stack Optimization of Transformer Inference: a Survey

Feb 27, 2023
Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

Figure 1 for Full Stack Optimization of Transformer Inference: a Survey
Figure 2 for Full Stack Optimization of Transformer Inference: a Survey
Figure 3 for Full Stack Optimization of Transformer Inference: a Survey
Figure 4 for Full Stack Optimization of Transformer Inference: a Survey
Viaarxiv icon

Big Little Transformer Decoder

Feb 15, 2023
Sehoon Kim, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

Figure 1 for Big Little Transformer Decoder
Figure 2 for Big Little Transformer Decoder
Figure 3 for Big Little Transformer Decoder
Figure 4 for Big Little Transformer Decoder
Viaarxiv icon