Picture for Nicholas Stranges

Nicholas Stranges

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training

Add code
May 12, 2026
Viaarxiv icon