Abstract:Adapting general-domain retrievers to scientific domains is challenging due to the scarcity of large-scale domain-specific relevance annotations and the substantial mismatch in vocabulary and information needs. Recent approaches address these issues through two independent directions that leverage large language models (LLMs): (1) generating synthetic queries for fine-tuning, and (2) generating auxiliary contexts to support relevance matching. However, both directions overlook the diverse academic concepts embedded within scientific documents, often producing redundant or conceptually narrow queries and contexts. To address this limitation, we introduce an academic concept index, which extracts key concepts from papers and organizes them guided by an academic taxonomy. This structured index serves as a foundation for improving both directions. First, we enhance the synthetic query generation with concept coverage-based generation (CCQGen), which adaptively conditions LLMs on uncovered concepts to generate complementary queries with broader concept coverage. Second, we strengthen the context augmentation with concept-focused auxiliary contexts (CCExpand), which leverages a set of document snippets that serve as concise responses to the concept-aware CCQGen queries. Extensive experiments show that incorporating the academic concept index into both query generation and context augmentation leads to higher-quality queries, better conceptual alignment, and improved retrieval performance.
Abstract:Language Models (LMs) are increasingly employed in recommendation systems due to their advanced language understanding and generation capabilities. Recent recommender systems based on generative retrieval have leveraged the inferential abilities of LMs to directly generate the index tokens of the next item, based on item sequences within the user's interaction history. Previous studies have mostly focused on item indices based solely on textual semantic or collaborative information. However, although the standalone effectiveness of these aspects has been demonstrated, the integration of this information has remained unexplored. Our in-depth analysis finds that there is a significant difference in the knowledge captured by the model from heterogeneous item indices and diverse input prompts, which can have a high potential for complementarity. In this paper, we propose SC-Rec, a unified recommender system that learns diverse preference knowledge from two distinct item indices and multiple prompt templates. Furthermore, SC-Rec adopts a novel reranking strategy that aggregates a set of ranking results, inferred based on different indices and prompts, to achieve the self-consistency of the model. Our empirical evaluation on three real-world datasets demonstrates that SC-Rec considerably outperforms the state-of-the-art methods for sequential recommendation, effectively incorporating complementary knowledge from varied outputs of the model.
Abstract:Cross-domain recommendation (CDR) extends conventional recommender systems by leveraging user-item interactions from dense domains to mitigate data sparsity and the cold start problem. While CDR offers substantial potential for enhancing recommendation performance, most existing CDR methods suffer from sensitivity to the ratio of overlapping users and intrinsic discrepancy between source and target domains. To overcome these limitations, in this work, we explore the application of graph signal processing (GSP) in CDR scenarios. We propose CGSP, a unified CDR framework based on GSP, which employs a cross-domain similarity graph constructed by flexibly combining target-only similarity and source-bridged similarity. By processing personalized graph signals computed for users from either the source or target domain, our framework effectively supports both inter-domain and intra-domain recommendations. Our empirical evaluation demonstrates that CGSP consistently outperforms various encoder-based CDR approaches in both intra-domain and inter-domain recommendation scenarios, especially when the ratio of overlapping users is low, highlighting its significant practical implication in real-world applications.