Picture for Martin Jaggi

Martin Jaggi

EPFL

Towards Fully FP8 GEMM LLM Training at Scale

Add code
May 26, 2025
Viaarxiv icon

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Add code
May 26, 2025
Viaarxiv icon

URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training

Add code
May 22, 2025
Viaarxiv icon

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation

Add code
Apr 24, 2025
Viaarxiv icon

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

Add code
Apr 08, 2025
Viaarxiv icon

Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs

Add code
Feb 07, 2025
Viaarxiv icon

Leveraging the true depth of LLMs

Add code
Feb 05, 2025
Viaarxiv icon

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Add code
Oct 31, 2024
Viaarxiv icon

Improving Stochastic Cubic Newton with Momentum

Add code
Oct 25, 2024
Figure 1 for Improving Stochastic Cubic Newton with Momentum
Figure 2 for Improving Stochastic Cubic Newton with Momentum
Figure 3 for Improving Stochastic Cubic Newton with Momentum
Viaarxiv icon

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Add code
Oct 07, 2024
Viaarxiv icon