Picture for Dongsoo Lee

Dongsoo Lee

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

Add code
Feb 28, 2024
Figure 1 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 2 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 3 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Figure 4 for No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
Viaarxiv icon

DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation

Add code
Feb 27, 2024
Figure 1 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Figure 2 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Figure 3 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Figure 4 for DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Viaarxiv icon

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Add code
Sep 27, 2023
Viaarxiv icon

FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization

Add code
Jun 01, 2023
Figure 1 for FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
Figure 2 for FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
Figure 3 for FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
Figure 4 for FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
Viaarxiv icon

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

Add code
May 23, 2023
Figure 1 for Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Figure 2 for Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Figure 3 for Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Figure 4 for Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Viaarxiv icon

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

Add code
Oct 08, 2022
Figure 1 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Figure 2 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Figure 3 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Figure 4 for AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Viaarxiv icon

DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation

Add code
Sep 22, 2022
Figure 1 for DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Figure 2 for DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Figure 3 for DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Figure 4 for DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Viaarxiv icon

nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models

Add code
Jun 20, 2022
Figure 1 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 2 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 3 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 4 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Viaarxiv icon

Maximum Likelihood Training of Implicit Nonlinear Diffusion Models

Add code
May 27, 2022
Figure 1 for Maximum Likelihood Training of Implicit Nonlinear Diffusion Models
Figure 2 for Maximum Likelihood Training of Implicit Nonlinear Diffusion Models
Figure 3 for Maximum Likelihood Training of Implicit Nonlinear Diffusion Models
Figure 4 for Maximum Likelihood Training of Implicit Nonlinear Diffusion Models
Viaarxiv icon

Modulating Regularization Frequency for Efficient Compression-Aware Model Training

Add code
May 05, 2021
Figure 1 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Figure 2 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Figure 3 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Figure 4 for Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Viaarxiv icon