Alert button
Picture for Gunho Park

Gunho Park

Alert button

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

Add code
Bookmark button
Alert button
Feb 28, 2024
June Yong Yang, Byeongwook Kim, Jeongin Bae, Beomseok Kwon, Gunho Park, Eunho Yang, Se Jung Kwon, Dongsoo Lee

Viaarxiv icon

nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models

Add code
Bookmark button
Alert button
Jun 20, 2022
Gunho Park, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Youngjoo Lee, Dongsoo Lee

Figure 1 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 2 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 3 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 4 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Viaarxiv icon