Picture for Deokgyu Yoon

Deokgyu Yoon

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards

Add code
May 20, 2026
Viaarxiv icon