Picture for Yaxiang Zhang

Yaxiang Zhang

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL

Add code
Feb 06, 2026
Viaarxiv icon

Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It

Add code
Feb 02, 2026
Viaarxiv icon