Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Dissecting Long Reasoning Models: An Empirical Study

Jun 05, 2025

Yongyu Mu, Jiali Zeng, Bei Li, Xinyan Guan, Fandong Meng, Jie Zhou, Tong Xiao, Jingbo Zhu

Figure 1 for Dissecting Long Reasoning Models: An Empirical Study

Figure 2 for Dissecting Long Reasoning Models: An Empirical Study

Figure 3 for Dissecting Long Reasoning Models: An Empirical Study

Figure 4 for Dissecting Long Reasoning Models: An Empirical Study

Share this with someone who'll enjoy it:

Abstract:Despite recent progress in training long-context reasoning models via reinforcement learning (RL), several open questions and counterintuitive behaviors remain. This work focuses on three key aspects: (1) We systematically analyze the roles of positive and negative samples in RL, revealing that positive samples mainly facilitate data fitting, whereas negative samples significantly enhance generalization and robustness. Interestingly, training solely on negative samples can rival standard RL training performance. (2) We identify substantial data inefficiency in group relative policy optimization, where over half of the samples yield zero advantage. To address this, we explore two straightforward strategies, including relative length rewards and offline sample injection, to better leverage these data and enhance reasoning efficiency and capability. (3) We investigate unstable performance across various reasoning models and benchmarks, attributing instability to uncertain problems with ambiguous outcomes, and demonstrate that multiple evaluation runs mitigate this issue.

* Working in process

View paper on

Share this with someone who'll enjoy it:

Title:Dissecting Long Reasoning Models: An Empirical Study

Paper and Code