Alert button

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Add code
Bookmark button
Alert button
Jan 31, 2024
Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: