Picture for Youssef Emad

Youssef Emad

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls

Add code
Oct 02, 2025
Viaarxiv icon

NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks

Add code
Jul 02, 2025
Viaarxiv icon