Picture for Clive Bai

Clive Bai

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning

Add code
Jan 13, 2026
Viaarxiv icon