Picture for Renjie Mao

Renjie Mao

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

Add code
Jun 09, 2026
Viaarxiv icon