Picture for Markus Nagel

Markus Nagel

The LLM Surgeon

Add code
Dec 28, 2023
Viaarxiv icon

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Add code
Oct 02, 2023
Figure 1 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device
Figure 2 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device
Figure 3 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device
Figure 4 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device
Viaarxiv icon

Softmax Bias Correction for Quantized Generative Models

Add code
Sep 04, 2023
Viaarxiv icon

ResQ: Residual Quantization for Video Perception

Add code
Aug 18, 2023
Figure 1 for ResQ: Residual Quantization for Video Perception
Figure 2 for ResQ: Residual Quantization for Video Perception
Figure 3 for ResQ: Residual Quantization for Video Perception
Figure 4 for ResQ: Residual Quantization for Video Perception
Viaarxiv icon

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Add code
Jul 10, 2023
Figure 1 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Figure 2 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Figure 3 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Figure 4 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Viaarxiv icon

Pruning vs Quantization: Which is Better?

Add code
Jul 06, 2023
Viaarxiv icon

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Add code
Jun 22, 2023
Figure 1 for Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Figure 2 for Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Figure 3 for Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Figure 4 for Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Viaarxiv icon

FP8 versus INT8 for efficient deep learning inference

Add code
Mar 31, 2023
Viaarxiv icon

A Practical Mixed Precision Algorithm for Post-Training Quantization

Add code
Feb 10, 2023
Viaarxiv icon

Quadapter: Adapter for GPT-2 Quantization

Add code
Nov 30, 2022
Figure 1 for Quadapter: Adapter for GPT-2 Quantization
Figure 2 for Quadapter: Adapter for GPT-2 Quantization
Figure 3 for Quadapter: Adapter for GPT-2 Quantization
Figure 4 for Quadapter: Adapter for GPT-2 Quantization
Viaarxiv icon