Picture for Martin Jaggi

Martin Jaggi

EPFL

TiMoE: Time-Aware Mixture of Language Experts

Add code
Aug 12, 2025
Viaarxiv icon

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Add code
Jun 26, 2025
Viaarxiv icon

Gradient-Normalized Smoothness for Optimization with Approximate Hessians

Add code
Jun 16, 2025
Viaarxiv icon

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Add code
May 26, 2025
Viaarxiv icon

Towards Fully FP8 GEMM LLM Training at Scale

Add code
May 26, 2025
Viaarxiv icon

URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training

Add code
May 22, 2025
Viaarxiv icon

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation

Add code
Apr 24, 2025
Viaarxiv icon

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

Add code
Apr 08, 2025
Figure 1 for Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Figure 2 for Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Figure 3 for Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Figure 4 for Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Viaarxiv icon

Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs

Add code
Feb 07, 2025
Figure 1 for Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs
Figure 2 for Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs
Figure 3 for Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs
Figure 4 for Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs
Viaarxiv icon

Leveraging the true depth of LLMs

Add code
Feb 05, 2025
Viaarxiv icon