Picture for Clive Bai

Clive Bai

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Add code
Feb 12, 2026
Viaarxiv icon

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning

Add code
Jan 13, 2026
Viaarxiv icon