Picture for Yuandong Tian

Yuandong Tian

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

STEM: Scaling Transformers with Embedding Modules

Add code
Jan 15, 2026
Viaarxiv icon

The Path Not Taken: RLVR Provably Learns Off the Principals

Add code
Nov 11, 2025
Viaarxiv icon

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Add code
Oct 10, 2025
Viaarxiv icon

Positional Encoding via Token-Aware Phase Attention

Add code
Sep 16, 2025
Viaarxiv icon

Language Self-Play For Data-Free Training

Add code
Sep 09, 2025
Viaarxiv icon

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Add code
May 18, 2025
Viaarxiv icon

GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection

Add code
Apr 29, 2025
Figure 1 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Figure 2 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Figure 3 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Figure 4 for GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
Viaarxiv icon

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference

Add code
Apr 28, 2025
Viaarxiv icon

Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost

Add code
Apr 23, 2025
Viaarxiv icon