Abstract:Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems. However, click interactions inherently contain substantial noise, including accidental clicks, clickbait-induced interactions, and exploratory browsing behaviors that do not reflect genuine user preferences. Training recommendation models with such noisy positive samples leads to degraded prediction accuracy and unreliable recommendations. In this paper, we propose SAID (Semantics-Aware Implicit Denoising), a simple yet effective framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions. Our approach constructs textual user interest profiles from historical behaviors and computes semantic similarity with target item descriptions using pre-trained language model (PLM) based text encoders. The similarity scores are then transformed into sample weights that modulate the training loss, effectively reducing the impact of semantically inconsistent clicks. Unlike existing denoising methods that require complex auxiliary networks or multi-stage training procedures, SAID only modifies the loss function while keeping the backbone recommendation model unchanged. Extensive experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines, with particularly notable robustness under high noise conditions.
Abstract:Recent advances in retrieval-augmented generation (RAG) have shown promise in enhancing recommendation systems with external knowledge. However, existing RAG-based recommenders face two critical challenges: (1) vulnerability to distribution shifts across different environments (e.g., time periods, user segments), leading to performance degradation in out-of-distribution (OOD) scenarios, and (2) lack of faithful explanations that can be verified against retrieved evidence. In this paper, we propose CIRR, a Causal-Invariant Retrieval-Augmented Recommendation framework that addresses both challenges simultaneously. CIRR learns environment-invariant user preference representations through causal inference, which guide a debiased retrieval process to select relevant evidence from multiple sources. Furthermore, we introduce consistency constraints that enforce faithfulness between retrieved evidence, generated explanations, and recommendation outputs. Extensive experiments on two real-world datasets demonstrate that CIRR achieves robust performance under distribution shifts, reducing performance degradation from 15.4% (baseline) to only 5.6% in OOD scenarios, while providing more faithful and interpretable explanations (26% improvement in faithfulness score) compared to state-of-the-art baselines.