Picture for Jiaran Zhang

Jiaran Zhang

Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Add code
Dec 25, 2025
Viaarxiv icon