Picture for Elias Frantar

Elias Frantar

Compression Scaling Laws:Unifying Sparsity and Quantization

Add code
Feb 23, 2025
Figure 1 for Compression Scaling Laws:Unifying Sparsity and Quantization
Figure 2 for Compression Scaling Laws:Unifying Sparsity and Quantization
Figure 3 for Compression Scaling Laws:Unifying Sparsity and Quantization
Figure 4 for Compression Scaling Laws:Unifying Sparsity and Quantization
Viaarxiv icon

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

Add code
Aug 21, 2024
Figure 1 for MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Figure 2 for MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Figure 3 for MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Figure 4 for MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Viaarxiv icon

Extreme Compression of Large Language Models via Additive Quantization

Add code
Jan 11, 2024
Viaarxiv icon

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Add code
Oct 25, 2023
Viaarxiv icon

Sparse Fine-tuning for Inference Acceleration of Large Language Models

Add code
Oct 13, 2023
Viaarxiv icon

Towards End-to-end 4-Bit Inference on Generative Large Language Models

Add code
Oct 13, 2023
Figure 1 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Figure 2 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Figure 3 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Figure 4 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Viaarxiv icon

Scaling Laws for Sparsely-Connected Foundation Models

Add code
Sep 15, 2023
Viaarxiv icon

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Add code
Aug 03, 2023
Viaarxiv icon

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models

Add code
Jul 07, 2023
Viaarxiv icon

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Add code
Jun 05, 2023
Viaarxiv icon