Picture for Martin Jaggi

Martin Jaggi

EPFL

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Add code
Jun 26, 2025
Viaarxiv icon

Gradient-Normalized Smoothness for Optimization with Approximate Hessians

Add code
Jun 16, 2025
Viaarxiv icon

Towards Fully FP8 GEMM LLM Training at Scale

Add code
May 26, 2025
Viaarxiv icon

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Add code
May 26, 2025
Viaarxiv icon

URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training

Add code
May 22, 2025
Viaarxiv icon

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation

Add code
Apr 24, 2025
Viaarxiv icon

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

Add code
Apr 08, 2025
Viaarxiv icon

Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs

Add code
Feb 07, 2025
Viaarxiv icon

Leveraging the true depth of LLMs

Add code
Feb 05, 2025
Viaarxiv icon

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Add code
Oct 31, 2024
Viaarxiv icon