Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuqi Bai

TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks

Jun 14, 2025

Zhou Chen, Zhiqiang Wei, Yuqi Bai, Xue Xiong, Jianmin Wu

Abstract:Model routing allocates queries to the suitable model, improving system performance while reducing costs. However, existing routing methods face practical limitations that hinder scalability in large-scale applications and struggle to keep up with the rapid growth of the large language model (LLM) ecosystem. To tackle these challenges, we propose TagRouter, a training-free model routing method designed to optimize the synergy among multiple LLMs for open-domain text generation tasks. Experimental results demonstrate that TagRouter outperforms 13 baseline methods, increasing the accept rate of system by 6.15% and reducing costs by 17.20%, achieving optimal cost-efficiency. Our findings provides the LLM community with an efficient and scalable solution for model ensembling, offering users an evolvable "super model."

* ACL 2025, 26 pages, 13 figures, 14 tables

Via

Access Paper or Ask Questions

IntOPE: Off-Policy Evaluation in the Presence of Interference

Aug 24, 2024

Yuqi Bai, Ziyu Zhao, Minqin Zhu, Kun Kuang

Figure 1 for IntOPE: Off-Policy Evaluation in the Presence of Interference

Figure 2 for IntOPE: Off-Policy Evaluation in the Presence of Interference

Figure 3 for IntOPE: Off-Policy Evaluation in the Presence of Interference

Figure 4 for IntOPE: Off-Policy Evaluation in the Presence of Interference

Abstract:Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that the reward for any given individual is unaffected by the actions of others. However, this assumption often fails in real-world scenarios due to the presence of interference, where an individual's reward is affected not just by their own actions but also by the actions of their peers. This realization reveals significant limitations of existing OPE methods in real-world applications. To address this limitation, we propose IntIPW, an IPW-style estimator that extends the Inverse Probability Weighting (IPW) framework by integrating marginalized importance weights to account for both individual actions and the influence of adjacent entities. Extensive experiments are conducted on both synthetic and real-world data to demonstrate the effectiveness of the proposed IntIPW method.

Via

Access Paper or Ask Questions