Picture for Byeongwook Kim

Byeongwook Kim

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Add code
Jun 04, 2025
Viaarxiv icon

An Investigation of FP8 Across Accelerators for LLM Inference

Add code
Feb 03, 2025
Viaarxiv icon

To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability

Add code
May 29, 2024
Figure 1 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 2 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 3 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 4 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Viaarxiv icon

HyperCLOVA X Technical Report

Add code
Apr 13, 2024
Viaarxiv icon

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

Add code
Feb 28, 2024
Figure 1 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 2 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 3 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 4 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Viaarxiv icon

DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation

Add code
Feb 27, 2024
Figure 1 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Figure 2 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Figure 3 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Figure 4 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Viaarxiv icon

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Add code
Sep 27, 2023
Viaarxiv icon

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

Add code
Oct 08, 2022
Figure 1 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Figure 2 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Figure 3 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Figure 4 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Viaarxiv icon

nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models

Add code
Jun 20, 2022
Figure 1 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 2 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 3 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 4 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Viaarxiv icon

Modulating Regularization Frequency for Efficient Compression-Aware Model Training

Add code
May 05, 2021
Figure 1 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Figure 2 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Figure 3 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Figure 4 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Viaarxiv icon