Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinkai Du

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Nov 14, 2025

Jun Xu, Xinkai Du, Yu Ao, Peilong Zhao, Yang Li, Ling Zhong, Lin Yuan, Zhongpu Bo, Xiaorui Wang, Mengshu Sun(+10 more)

Figure 1 for Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Figure 2 for Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Figure 3 for Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Figure 4 for Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Abstract:Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes. The source code is available at https://github.com/OpenSPG/KAG-Thinker.

* Accepted to AAAI 2026. Extended version with full Appendix

Via

Access Paper or Ask Questions

Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation

Dec 25, 2024

Xinkai Du, Quanjie Han, Chao Lv, Yan Liu, Yalin Sun, Hao Shu, Hongbo Shan, Maosong Sun

Figure 1 for Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation

Figure 2 for Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation

Figure 3 for Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation

Figure 4 for Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation

Abstract:Open-domain Question Answering (QA) has garnered substantial interest by combining the advantages of faithfully retrieved passages and relevant passages generated through Large Language Models (LLMs). However, there is a lack of definitive labels available to pair these sources of knowledge. In order to address this issue, we propose an unsupervised and simple framework called Bi-Reranking for Merging Generated and Retrieved Knowledge (BRMGR), which utilizes re-ranking methods for both retrieved passages and LLM-generated passages. We pair the two types of passages using two separate re-ranking methods and then combine them through greedy matching. We demonstrate that BRMGR is equivalent to employing a bipartite matching loss when assigning each retrieved passage with a corresponding LLM-generated passage. The application of our model yielded experimental results from three datasets, improving their performance by +1.7 and +1.6 on NQ and WebQ datasets, respectively, and obtaining comparable result on TriviaQA dataset when compared to competitive baselines.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions