Picture for Se Jung Kwon

Se Jung Kwon

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Add code
Jun 04, 2025
Viaarxiv icon

An Investigation of FP8 Across Accelerators for LLM Inference

Add code
Feb 03, 2025
Viaarxiv icon

Debunking the CUDA Myth Towards GPU-based AI Systems

Add code
Dec 31, 2024
Figure 1 for Debunking the CUDA Myth Towards GPU-based AI Systems
Figure 2 for Debunking the CUDA Myth Towards GPU-based AI Systems
Figure 3 for Debunking the CUDA Myth Towards GPU-based AI Systems
Figure 4 for Debunking the CUDA Myth Towards GPU-based AI Systems
Viaarxiv icon

LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

Add code
Jul 16, 2024
Figure 1 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Figure 2 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Figure 3 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Figure 4 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Viaarxiv icon

To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability

Add code
May 29, 2024
Figure 1 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 2 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 3 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 4 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Viaarxiv icon

HyperCLOVA X Technical Report

Add code
Apr 13, 2024
Viaarxiv icon

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

Add code
Feb 28, 2024
Figure 1 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 2 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 3 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 4 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Viaarxiv icon

Label-Noise Robust Diffusion Models

Add code
Feb 27, 2024
Viaarxiv icon

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Add code
Sep 27, 2023
Viaarxiv icon

FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization

Add code
Jun 01, 2023
Viaarxiv icon