Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuhui Cai

DMA: Online RAG Alignment with Human Feedback

Nov 06, 2025

Yu Bai, Yukai Miao, Dawei Wang, Li Chen, Fei Long, Rundi Zhai, Dan Li, Yanyu Ren, Tianfeng Liu, Hongtao Xie(+2 more)

Figure 1 for DMA: Online RAG Alignment with Human Feedback

Figure 2 for DMA: Online RAG Alignment with Human Feedback

Figure 3 for DMA: Online RAG Alignment with Human Feedback

Figure 4 for DMA: Online RAG Alignment with Human Feedback

Abstract:Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning framework that systematically incorporates multi-granularity human feedback to align ranking in interactive settings. DMA organizes document-, list-, and response-level signals into a coherent learning pipeline: supervised training for pointwise and listwise rankers, policy optimization driven by response-level preferences, and knowledge distillation into a lightweight scorer for low-latency serving. Throughout this paper, memory refers to the model's working memory, which is the entire context visible to the LLM for In-Context Learning. We adopt a dual-track evaluation protocol mirroring deployment: (i) large-scale online A/B ablations to isolate the utility of each feedback source, and (ii) few-shot offline tests on knowledge-intensive benchmarks. Online, a multi-month industrial deployment further shows substantial improvements in human engagement. Offline, DMA preserves competitive foundational retrieval while yielding notable gains on conversational QA (TriviaQA, HotpotQA). Taken together, these results position DMA as a principled approach to feedback-driven, real-time adaptation in RAG without sacrificing baseline capability.

Via

Access Paper or Ask Questions

Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Jun 21, 2024

Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

Figure 1 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Figure 2 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Figure 3 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Figure 4 for Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Abstract:In Greek mythology, Pistis symbolized good faith, trust, and reliability, echoing the core principles of RAG in LLM systems. Pistis-RAG, a scalable multi-stage framework, effectively addresses the challenges of large-scale retrieval-augmented generation (RAG). Each stage plays a distinct role: matching refines the search space, pre-ranking prioritizes semantically relevant documents, and ranking aligns with the large language model's (LLM) preferences. The reasoning and aggregating stage supports the implementation of complex chain-of-thought (CoT) methods within this cascading structure. We argue that the lack of strong alignment between LLMs and the external knowledge ranking methods used in RAG tasks is relevant to the reliance on the model-centric paradigm in RAG frameworks. A content-centric approach would prioritize seamless integration between the LLMs and external information sources, optimizing the content transformation process for each specific task. Critically, our ranking stage deviates from traditional RAG approaches by recognizing that semantic relevance alone may not directly translate to improved generation. This is due to the sensitivity of the few-shot prompt order, as highlighted in prior work \cite{lu2021fantastically}. Current RAG frameworks fail to account for this crucial factor. We introduce a novel ranking stage specifically designed for RAG systems. It adheres to information retrieval principles while considering the unique business scenario captured by LLM preferences and user feedback. Our approach integrates in-context learning (ICL) methods and reasoning steps to incorporate user feedback, ensuring efficient alignment. Experiments on the MMLU benchmark demonstrate a 9.3\% performance improvement. The model and code will be open-sourced on GitHub. Experiments on real-world, large-scale data validate our framework's scalability.

Via

Access Paper or Ask Questions