Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yinchun Wang

Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

Jul 23, 2025

Zhenyun Yin, Shujie Wang, Xuhong Wang, Xingjun Ma, Yinchun Wang

Abstract:Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step reflection and verification over Wikipedia data and is trained with a reinforcement learning algorithm that optimizes for accuracy under a soft reliability constraint. Empirical results show that proposed method improves alignment between model confidence and correctness, leading to more trustworthy outputs. This paper will be continuously updated.

* The inconsistency of base models undermines the fairness of evaluation comparisons and affects the validity of the paper's conclusions

Via

Access Paper or Ask Questions

Shai: A large language model for asset management

Dec 21, 2023

Zhongyang Guo, Guanran Jiang, Zhongdan Zhang, Peng Li, Zhefeng Wang, Yinchun Wang

Abstract:This paper introduces "Shai" a 10B level large language model specifically designed for the asset management industry, built upon an open-source foundational model. With continuous pre-training and fine-tuning using a targeted corpus, Shai demonstrates enhanced performance in tasks relevant to its domain, outperforming baseline models. Our research includes the development of an innovative evaluation framework, which integrates professional qualification exams, tailored tasks, open-ended question answering, and safety assessments, to comprehensively assess Shai's capabilities. Furthermore, we discuss the challenges and implications of utilizing large language models like GPT-4 for performance assessment in asset management, suggesting a combination of automated evaluation and human judgment. Shai's development, showcasing the potential and versatility of 10B-level large language models in the financial sector with significant performance and modest computational requirements, hopes to provide practical insights and methodologies to assist industry peers in their similar endeavors.

Via

Access Paper or Ask Questions