Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kyungro Lee

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

Mar 02, 2026

Jiyoon Myung, Jungki Son, Kyungro Lee, Jihyeon Park, Joohyung Han

Abstract:Retrieval systems often fail when user queries differ stylistically or semantically from the language used in domain documents. Query rewriting has been proposed to bridge this gap, improving retrieval by reformulating user queries into semantically equivalent forms. However, most existing approaches overlook the stylistic characteristics of target documents-their domain-specific phrasing, tone, and structure-which are crucial for matching real-world data distributions. We introduce a retrieval feedback-driven dataset generation framework that automatically identifies failed retrieval cases, leverages large language models to rewrite queries in the style of relevant documents, and verifies improvement through re-retrieval. The resulting corpus of (original, rewritten) query pairs enables the training of rewriter models that are explicitly aware of document style and retrieval feedback. This work highlights a new direction in data-centric information retrieval, emphasizing how feedback loops and document-style alignment can enhance the reasoning and adaptability of RAG systems in real-world, domain-specific contexts.

* Accepted at the Workshop on New Frontiers in Information Retrieval (AAAI 2026)

Via

Access Paper or Ask Questions

MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

Dec 08, 2025

Kyungro Lee, Dongha Choi, Hyunju Lee

Figure 1 for MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

Figure 2 for MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

Figure 3 for MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

Figure 4 for MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue

Abstract:As dialogue systems become increasingly important across various domains, a key challenge in persona-based dialogue is generating engaging and context-specific interactions while ensuring the model acts with a coherent personality. However, existing persona-based dialogue datasets lack explicit relations between persona sentences and responses, which makes it difficult for models to effectively capture persona information. To address these issues, we propose MoCoRP (Modeling Consistent Relations between Persona and Response), a framework that incorporates explicit relations into language models. MoCoRP leverages an NLI expert to explicitly extract the NLI relations between persona sentences and responses, enabling the model to effectively incorporate appropriate persona information from the context into its responses. We applied this framework to pre-trained models like BART and further extended it to modern large language models (LLMs) through alignment tuning. Experimental results on the public datasets ConvAI2 and MPChat demonstrate that MoCoRP outperforms existing baselines, achieving superior persona consistency and engaging, context-aware dialogue generation. Furthermore, our model not only excels in quantitative metrics but also shows significant improvements in qualitative aspects. These results highlight the effectiveness of explicitly modeling persona-response relations in persona-based dialogue. The source codes of MoCoRP are available at https://github.com/DMCB-GIST/MoCoRP.

* 18 pages

Via

Access Paper or Ask Questions

Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation

Oct 01, 2024

Jiyoon Myung, Jihyeon Park, Jungki Son, Kyungro Lee, Joohyung Han

Figure 1 for Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation

Figure 2 for Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation

Figure 3 for Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation

Figure 4 for Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation

Abstract:This paper addresses the challenge of accurately translating technical terms, which are crucial for clear communication in specialized fields. We introduce the Parenthetical Terminology Translation (PTT) task, designed to mitigate potential inaccuracies by displaying the original term in parentheses alongside its translation. To implement this approach, we generated a representative PTT dataset using a collaborative approach with large language models and applied knowledge distillation to fine-tune traditional Neural Machine Translation (NMT) models and small-sized Large Language Models (sLMs). Additionally, we developed a novel evaluation metric to assess both overall translation accuracy and the correct parenthetical presentation of terms. Our findings indicate that sLMs did not consistently outperform NMT models, with fine-tuning proving more effective than few-shot prompting, particularly in models with continued pre-training in the target language. These insights contribute to the advancement of more reliable terminology translation methodologies.

* Paper accepted in EMNLPW 2024

Via

Access Paper or Ask Questions