Picture for Alexis Limozin

Alexis Limozin

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning

Add code
Apr 26, 2026
Viaarxiv icon