Abstract:Self-evolving search agents reduce reliance on human-written training questions by generating and solving their own search tasks. We build on Search Self-Play (SSP), a representative Proposer and Solver framework in which questions are generated and answered via multi-step search and reasoning. In practice, however, SSP faces two bottlenecks: the Proposer constructs questions from isolated answer entities without relational context, yielding many invalid or unverifiable questions in early self-play training, while the Solver receives only a binary outcome reward that discards useful signal from partially on-track search trajectories. We address both bottlenecks by reusing knowledge-graph paths as construction-derived intermediate supervision for both question construction and reward shaping. First, we ground question construction in LLM-guided knowledge-graph subgraphs, providing relational context for the Proposer. Second, we observe that constructing and solving a multi-hop question can involve overlapping intermediate entities: the factual bridges used to formulate the question may provide approximate waypoints for answering it. Exploiting this overlap, we introduce Waypoint Coverage Reward (WCR), which grants graded partial credit to incorrect Solver trajectories according to their coverage of entities on the construction path, while preserving full reward for correct answers. Across seven QA benchmarks and nine model configurations, our approach improves the average score over standard SSP in all configurations, including notable gains on multi-hop QA tasks. These results suggest that knowledge-graph paths can be reused as lightweight intermediate supervision, providing both relational guidance and process feedback without additional task-specific human annotations or manually labeled process steps.
Abstract:Multimodal large language models (MLLMs) exhibit a pronounced preference for textual inputs when processing vision-language data, limiting their ability to reason effectively from visual evidence. Unlike prior studies that attribute this text bias to external factors such as data imbalance or instruction tuning, we propose that the bias originates from the model's internal architecture. Specifically, we hypothesize that visual key vectors (Visual Keys) are out-of-distribution (OOD) relative to the text key space learned during language-only pretraining. Consequently, these visual keys receive systematically lower similarity scores during attention computation, leading to their under-utilization in the context representation. To validate this hypothesis, we extract key vectors from LLaVA and Qwen2.5-VL and analyze their distributional structures using qualitative (t-SNE) and quantitative (Jensen-Shannon divergence) methods. The results provide direct evidence that visual and textual keys occupy markedly distinct subspaces within the attention space. The inter-modal divergence is statistically significant, exceeding intra-modal variation by several orders of magnitude. These findings reveal that text bias arises from an intrinsic misalignment within the attention key space rather than solely from external data factors.
Abstract:Traditional Chinese medicine (TCM) exhibits remarkable therapeutic efficacy in disease treatment and healthcare through personalized herb prescriptions. However, current herb recommendation models inadequately capture the multiscale relations between herbs and clinical symptoms, particularly neglecting latent correlations at the chemical-molecular scale. To address these limitations, we propose the Fusion of Multiscale Correlations of Herbs and Symptoms (FMCHS), an innovative framework that synergistically integrates molecular-scale chemical characteristics of herbs with clinical symptoms. The framework employs multi-relational graph transformer layers to generate enriched embeddings that preserve both structural and semantic features within herbs and symptoms. Through systematic incorporation of herb chemical profiles into node embeddings and implementation of attention-based feature fusion, FMCHS effectively utilizes multiscale correlations. Comprehensive evaluations demonstrate FMCHS's superior performance over the state-of-the-art (SOTA) baseline, achieving relative improvements of 8.85% in Precision@5, 12.30% in Recall@5, and 10.86% in F1@5 compared to the SOTA model on benchmark datasets. This work facilitates the practical application of TCM in disease treatment and healthcare.