Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Dec 25, 2025

Yanhao Li, Lu Ma, Jiaran Zhang, Lexiang Tang, Wentao Zhang, Guibo Luo

Figure 1 for Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Figure 2 for Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Figure 3 for Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Figure 4 for Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Share this with someone who'll enjoy it:

Abstract:Existing approaches typically rely on fixed length penalties, but such penalties are hard to tune and fail to adapt to the evolving reasoning abilities of LLMs, leading to suboptimal trade-offs between accuracy and conciseness. To address this challenge, we propose Leash (adaptive LEngth penAlty and reward SHaping), a reinforcement learning framework for efficient reasoning in LLMs. We formulate length control as a constrained optimization problem and employ a Lagrangian primal-dual method to dynamically adjust the penalty coefficient. When generations exceed the target length, the penalty is intensified; when they are shorter, it is relaxed. This adaptive mechanism guides models toward producing concise reasoning without sacrificing task performance. Experiments on Deepseek-R1-Distill-Qwen-1.5B and Qwen3-4B-Thinking-2507 show that Leash reduces the average reasoning length by 60% across diverse tasks - including in-distribution mathematical reasoning and out-of-distribution domains such as coding and instruction following - while maintaining competitive performance. Our work thus presents a practical and effective paradigm for developing controllable and efficient LLMs that balance reasoning capabilities with computational budgets.

View paper on

Share this with someone who'll enjoy it:

Title:Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Paper and Code