Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiaoming Zhu

Hint-Guided Diversified Policy Optimization for LLM Reasoning

Jun 02, 2026

Zhiyu Cao, Kaixin Wu, Mingjie Zhong, Peifeng Li, Xiaobo Li, Can Ye, Qiaoming Zhu

Abstract:Recent developments in Large Language Models (LLMs) have showcased impressive reasoning capabilities, with Reinforcement Learning with Verifiable Rewards (RLVR) being a promising enhancement strategy. However, existing reward mechanisms are constrained to the outcome-level correctness and lack explicit signals to guide the model to consider diverse solutions. In contrast, human problem solving typically involves evaluating multiple potential approaches and selecting the most reliable solution, a cognitive process that current RLVR frameworks do not explicitly incentivize. Inspired by this, we propose Hint-Guided Diversified Policy Optimization (HDPO), allowing the model to first list all potential candidate solution outlines as hints and then select the most reliable one for further reasoning. HDPO comprises two stages of Cold Start for Structured Reasoning and Hint-Guided Diversified Reinforcement Learning to incentivize the model to generate diverse and reliable solutions following the ``propose-select-think'' trajectory. Experimental results show that HDPO effectively boosts LLM reasoning and enhances the diversity of candidate solutions as well as the LLM's ability to identify reliable solutions.

Via

Access Paper or Ask Questions

Discourse Coherence and Response-Guided Context Rewriting for Multi-Party Dialogue Generation

Apr 08, 2026

Zhiyu Cao, Peifeng Li, Qiaoming Zhu

Abstract:Previous research on multi-party dialogue generation has predominantly leveraged structural information inherent in dialogues to directly inform the generation process. However, the prevalence of colloquial expressions and incomplete utterances in dialogues often impedes comprehension and weakens the fidelity of dialogue structure representations, which is particularly pronounced in multi-party dialogues. In this work, we propose a novel framework DRCR (Discourse coherence and Response-guided Context Rewriting) to improve multi-party dialogue generation through dialogue context rewriting. Specifically, DRCR employs two complementary feedback signals, discourse coherence and response quality, to construct preference data for both context rewriting and response generation. Moreover, we propose a dynamic self-evolution learning method that allows the rewriter and responder to continuously enhance their capabilities through mutual interaction in an iterative training loop. Comprehensive experiments conducted on four multi-party dialogue datasets substantiate the effectiveness of DRCR.

* ACL 2026 Main Conference

Via

Access Paper or Ask Questions

Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search

Apr 08, 2026

Zhiyu Cao, Peifeng Li, Qiaoming Zhu

Abstract:Conversational Query Rewriting (CQR) aims to rewrite ambiguous queries to achieve more efficient conversational search. Early studies have predominantly focused on the rewriting in isolation, ignoring the feedback from query rewrite, passage retrieval and response generation in the rewriting process. To address this issue, we propose Multi-Faceted Self-Consistent Preference Aligned CQR (MSPA-CQR). Specifically, we first construct self-consistent preference alignment data from three dimensions (rewriting, retrieval, and response) to generate more diverse rewritten queries. Then we propose prefix guided multi-faceted direct preference optimization to learn preference information from three different dimensions. The experimental results show that our MSPA-CQR is effective in both in- and out-of-distribution scenarios.

* ACL 2026 Findings

Via

Access Paper or Ask Questions

FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation

Oct 23, 2025

Wenhao Wang, Kehe Ye, Xinyu Zhou, Tianxing Chen, Cao Min, Qiaoming Zhu, Xiaokang Yang, Yongjian Shen, Yang Yang, Maoqing Yao(+1 more)

Figure 1 for FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation

Figure 2 for FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation

Figure 3 for FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation

Figure 4 for FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation

Abstract:Large-scale and diverse datasets are vital for training robust robotic manipulation policies, yet existing data collection methods struggle to balance scale, diversity, and quality. Simulation offers scalability but suffers from sim-to-real gaps, while teleoperation yields high-quality demonstrations with limited diversity and high labor cost. We introduce FieldGen, a field-guided data generation framework that enables scalable, diverse, and high-quality real-world data collection with minimal human supervision. FieldGen decomposes manipulation into two stages: a pre-manipulation phase, allowing trajectory diversity, and a fine manipulation phase requiring expert precision. Human demonstrations capture key contact and pose information, after which an attraction field automatically generates diverse trajectories converging to successful configurations. This decoupled design combines scalable trajectory diversity with precise supervision. Moreover, FieldGen-Reward augments generated data with reward annotations to further enhance policy learning. Experiments demonstrate that policies trained with FieldGen achieve higher success rates and improved stability compared to teleoperation-based baselines, while significantly reducing human effort in long-term real-world data collection. Webpage is available at https://fieldgen.github.io/.

* Webpage: https://fieldgen.github.io/

Via

Access Paper or Ask Questions

ICR: Iterative Clarification and Rewriting for Conversational Search

Sep 05, 2025

Zhiyu Cao, Peifeng Li, Qiaoming Zhu

Figure 1 for ICR: Iterative Clarification and Rewriting for Conversational Search

Figure 2 for ICR: Iterative Clarification and Rewriting for Conversational Search

Figure 3 for ICR: Iterative Clarification and Rewriting for Conversational Search

Figure 4 for ICR: Iterative Clarification and Rewriting for Conversational Search

Abstract:Most previous work on Conversational Query Rewriting employs an end-to-end rewriting paradigm. However, this approach is hindered by the issue of multiple fuzzy expressions within the query, which complicates the simultaneous identification and rewriting of multiple positions. To address this issue, we propose a novel framework ICR (Iterative Clarification and Rewriting), an iterative rewriting scheme that pivots on clarification questions. Within this framework, the model alternates between generating clarification questions and rewritten queries. The experimental results show that our ICR can continuously improve retrieval performance in the clarification-rewriting iterative process, thereby achieving state-of-the-art performance on two popular datasets.

Via

Access Paper or Ask Questions

Improving Dialogue Discourse Parsing through Discourse-aware Utterance Clarification

Jun 18, 2025

Yaxin Fan, Peifeng Li, Qiaoming Zhu

Figure 1 for Improving Dialogue Discourse Parsing through Discourse-aware Utterance Clarification

Figure 2 for Improving Dialogue Discourse Parsing through Discourse-aware Utterance Clarification

Figure 3 for Improving Dialogue Discourse Parsing through Discourse-aware Utterance Clarification

Figure 4 for Improving Dialogue Discourse Parsing through Discourse-aware Utterance Clarification

Abstract:Dialogue discourse parsing aims to identify and analyze discourse relations between the utterances within dialogues. However, linguistic features in dialogues, such as omission and idiom, frequently introduce ambiguities that obscure the intended discourse relations, posing significant challenges for parsers. To address this issue, we propose a Discourse-aware Clarification Module (DCM) to enhance the performance of the dialogue discourse parser. DCM employs two distinct reasoning processes: clarification type reasoning and discourse goal reasoning. The former analyzes linguistic features, while the latter distinguishes the intended relation from the ambiguous one. Furthermore, we introduce Contribution-aware Preference Optimization (CPO) to mitigate the risk of erroneous clarifications, thereby reducing cascading errors. CPO enables the parser to assess the contributions of the clarifications from DCM and provide feedback to optimize the DCM, enhancing its adaptability and alignment with the parser's requirements. Extensive experiments on the STAC and Molweni datasets demonstrate that our approach effectively resolves ambiguities and significantly outperforms the state-of-the-art (SOTA) baselines.

* Accepted by ACL2025(main conference)

Via

Access Paper or Ask Questions

Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction

Jun 16, 2025

Didi Zhang, Yaxin Fan, Peifeng Li, Qiaoming Zhu

Figure 1 for Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction

Figure 2 for Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction

Figure 3 for Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction

Figure 4 for Enhancing Goal-oriented Proactive Dialogue Systems via Consistency Reflection and Correction

Abstract:This paper proposes a consistency reflection and correction method for goal-oriented dialogue systems.

Via

Access Paper or Ask Questions

Two-stage Incomplete Utterance Rewriting on Editing Operation

Mar 20, 2025

Zhiyu Cao, Peifeng Li, Qiaoming Zhu, Yaxin Fan

Abstract:Previous work on Incomplete Utterance Rewriting (IUR) has primarily focused on generating rewritten utterances based solely on dialogue context, ignoring the widespread phenomenon of coreference and ellipsis in dialogues. To address this issue, we propose a novel framework called TEO (\emph{Two-stage approach on Editing Operation}) for IUR, in which the first stage generates editing operations and the second stage rewrites incomplete utterances utilizing the generated editing operations and the dialogue context. Furthermore, an adversarial perturbation strategy is proposed to mitigate cascading errors and exposure bias caused by the inconsistency between training and inference in the second stage. Experimental results on three IUR datasets show that our TEO outperforms the SOTA models significantly.

Via

Access Paper or Ask Questions

Incomplete Utterance Rewriting with Editing Operation Guidance and Utterance Augmentation

Mar 20, 2025

Zhiyu Cao, Peifeng Li, Yaxin Fan, Qiaoming Zhu

Abstract:Although existing fashionable generation methods on Incomplete Utterance Rewriting (IUR) can generate coherent utterances, they often result in the inclusion of irrelevant and redundant tokens in rewritten utterances due to their inability to focus on critical tokens in dialogue context. Furthermore, the limited size of the training datasets also contributes to the insufficient training of the IUR model. To address the first issue, we propose a multi-task learning framework EO-IUR (Editing Operation-guided Incomplete Utterance Rewriting) that introduces the editing operation labels generated by sequence labeling module to guide generation model to focus on critical tokens. Furthermore, we introduce a token-level heterogeneous graph to represent dialogues. To address the second issue, we propose a two-dimensional utterance augmentation strategy, namely editing operation-based incomplete utterance augmentation and LLM-based historical utterance augmentation. The experimental results on three datasets demonstrate that our EO-IUR outperforms previous state-of-the-art (SOTA) baselines in both open-domain and task-oriented dialogue. The code will be available at https://github.com/Dewset/EO-IUR.

Via

Access Paper or Ask Questions

Revealing and Mitigating Over-Attention in Knowledge Editing

Feb 20, 2025

Pinzheng Wang, Zecheng Tang, Keyan Zhou, Juntao Li, Qiaoming Zhu, Min Zhang

Figure 1 for Revealing and Mitigating Over-Attention in Knowledge Editing

Figure 2 for Revealing and Mitigating Over-Attention in Knowledge Editing

Figure 3 for Revealing and Mitigating Over-Attention in Knowledge Editing

Figure 4 for Revealing and Mitigating Over-Attention in Knowledge Editing

Abstract:Large Language Models have demonstrated superior performance across a wide range of tasks, but they still exhibit undesirable errors due to incorrect knowledge learned from the training data. To avoid this, knowledge editing methods emerged to precisely edit the specific model knowledge via efficiently modifying a very small percentage of parameters. % However, those methods can lead to the problem of Specificity Failure: when the content related to the edited knowledge occurs in the context, it can inadvertently corrupt other pre-existing knowledge. However, those methods can lead to the problem of Specificity Failure, where the existing knowledge and capabilities are severely degraded due to editing. Our preliminary indicates that Specificity Failure primarily stems from the model's attention heads assigning excessive attention scores to entities related to the edited knowledge, thereby unduly focusing on specific snippets within the context, which we denote as the Attention Drift phenomenon. To mitigate such Attention Drift issue, we introduce a simple yet effective method Selective Attention Drift Restriction}(SADR), which introduces an additional regularization term during the knowledge editing process to restrict changes in the attention weight distribution, thereby preventing undue focus on the edited entity. Experiments on five frequently used strong LLMs demonstrate the effectiveness of our method, where SADR can significantly mitigate Specificity Failure in the predominant knowledge editing tasks.

Via

Access Paper or Ask Questions