Picture for Markus Nagel

Markus Nagel

Efficient Reasoning on the Edge

Add code
Mar 17, 2026
Viaarxiv icon

Leech Lattice Vector Quantization for Efficient LLM Compression

Add code
Mar 11, 2026
Viaarxiv icon

Dissecting Quantization Error: A Concentration-Alignment Perspective

Add code
Mar 04, 2026
Viaarxiv icon

STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization

Add code
Oct 30, 2025
Viaarxiv icon

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

Add code
Jun 11, 2025
Figure 1 for HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Figure 2 for HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Figure 3 for HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Figure 4 for HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Viaarxiv icon

FPTQuant: Function-Preserving Transforms for LLM Quantization

Add code
Jun 05, 2025
Figure 1 for FPTQuant: Function-Preserving Transforms for LLM Quantization
Figure 2 for FPTQuant: Function-Preserving Transforms for LLM Quantization
Figure 3 for FPTQuant: Function-Preserving Transforms for LLM Quantization
Figure 4 for FPTQuant: Function-Preserving Transforms for LLM Quantization
Viaarxiv icon

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Add code
Dec 02, 2024
Figure 1 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 2 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 3 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 4 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Viaarxiv icon

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Add code
Nov 27, 2024
Figure 1 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Figure 2 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Figure 3 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Figure 4 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Viaarxiv icon

Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters

Add code
Jul 22, 2024
Figure 1 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 2 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 3 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 4 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Viaarxiv icon

Sparse High Rank Adapters

Add code
Jun 19, 2024
Figure 1 for Sparse High Rank Adapters
Figure 2 for Sparse High Rank Adapters
Figure 3 for Sparse High Rank Adapters
Figure 4 for Sparse High Rank Adapters
Viaarxiv icon