Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaobing Zhou

SAPO: Self-Adaptive Process Optimization Makes Small Reasoners Stronger

Jan 28, 2026

Kaiyuan Chen, Guangmin Zheng, Jin Wang, Xiaobing Zhou, Xuejie Zhang

Abstract:Existing self-evolution methods overlook the influence of fine-grained reasoning steps, which leads to the reasoner-verifier gap. The computational inefficiency of Monte Carlo (MC) process supervision further exacerbates the difficulty in mitigating the gap. Motivated by the Error-Related Negativity (ERN), which the reasoner can localize error following incorrect decisions, guiding rapid adjustments, we propose a Self-Adaptive Process Optimization (SAPO) method for self-improvement in Small Language Models (SLMs). SAPO adaptively and efficiently introduces process supervision signals by actively minimizing the reasoner-verifier gap rather than relying on inefficient MC estimations. Extensive experiments demonstrate that the proposed method outperforms most existing self-evolution methods on two challenging task types: mathematics and code. Additionally, to further investigate SAPO's impact on verifier performance, this work introduces two new benchmarks for process reward models in both mathematical and coding tasks.

* Accepted by AAAI 2026

Via

Access Paper or Ask Questions

Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

May 16, 2024

Guangmin Zheng, Jin Wang, Xiaobing Zhou, Xuejie Zhang

Figure 1 for Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

Figure 2 for Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

Figure 3 for Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

Figure 4 for Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

Abstract:Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA dataset demonstrated the effectiveness of the proposed method. Code and data are released at https://github.com/zgMin/SNSE-CoT.

* Accepted by LREC-COLING 2024

Via

Access Paper or Ask Questions