Picture for Mark Kurtz

Mark Kurtz

An Interpretable Latency Model for Speculative Decoding in LLM Serving

Add code
May 14, 2026
Viaarxiv icon

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Add code
Nov 04, 2024
Figure 1 for "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Figure 2 for "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Figure 3 for "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Figure 4 for "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Viaarxiv icon

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Add code
May 06, 2024
Viaarxiv icon

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

Add code
Apr 04, 2023
Figure 1 for oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Figure 2 for oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Figure 3 for oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Figure 4 for oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Viaarxiv icon

Sparse*BERT: Sparse Models are Robust

Add code
May 25, 2022
Figure 1 for Sparse*BERT: Sparse Models are Robust
Figure 2 for Sparse*BERT: Sparse Models are Robust
Figure 3 for Sparse*BERT: Sparse Models are Robust
Figure 4 for Sparse*BERT: Sparse Models are Robust
Viaarxiv icon

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Add code
Mar 14, 2022
Figure 1 for The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Figure 2 for The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Figure 3 for The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Figure 4 for The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Viaarxiv icon

How Well Do Sparse Imagenet Models Transfer?

Add code
Dec 23, 2021
Figure 1 for How Well Do Sparse Imagenet Models Transfer?
Figure 2 for How Well Do Sparse Imagenet Models Transfer?
Figure 3 for How Well Do Sparse Imagenet Models Transfer?
Figure 4 for How Well Do Sparse Imagenet Models Transfer?
Viaarxiv icon