Picture for Eduard Durech

Eduard Durech

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning

Add code
Apr 26, 2026
Viaarxiv icon