Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

May 28, 2025

Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan, Shuo Wang, Zhiyuan Liu, Yu Gu, Minghe Yu, Ge Yu(+1 more)

Figure 1 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Figure 2 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Figure 3 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Figure 4 for Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Share this with someone who'll enjoy it:

Abstract:Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook the reasoning and planning capabilities of MLLMs to dynamically determine how to interact with different KBs during the reasoning process. To address this limitation, we propose R1-Router, a novel MRAG framework that learns to decide when and where to retrieve knowledge based on the evolving reasoning state. Specifically, R1-Router can generate follow-up queries according to the current reasoning step, routing these intermediate queries to the most suitable KB, and integrating external knowledge into a coherent reasoning trajectory to answer the original query. Furthermore, we introduce Step-wise Group Relative Policy Optimization (Step-GRPO), a tailored reinforcement learning algorithm that assigns step-specific rewards to optimize the reasoning behavior of MLLMs. Experimental results on various open-domain QA benchmarks across multiple modalities demonstrate that R1-Router outperforms baseline models by over 7%. Further analysis shows that R1-Router can adaptively and effectively leverage diverse KBs, reducing unnecessary retrievals and improving both efficiency and accuracy.

View paper on

Share this with someone who'll enjoy it:

Title:Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Paper and Code