Abstract:Large Language Models (LLMs) are equipped with profound semantic knowledge, making them a natural choice for injecting semantic generalization into personalized search systems. However, in practice we find that directly fine-tuning LLMs on industrial personalized tasks (e.g. next item prediction) often yields suboptimal results. We attribute this bottleneck to a critical Knowledge--Action Gap: the inherent conflict between preserving pre-trained semantic knowledge and aligning with specific personalized actions by discriminative objectives. Empirically, action-only training objectives induce Semantic Collapse, such as attention ``sinks''. This degradation severely cripples the LLM's generalization, failing to bring improvements to personalized search systems. We propose KARMA (Knowledge--Action Regularized Multimodal Alignment), a unified framework that treats semantic reconstruction as a train-only regularizer. KARMA optimizes a next-interest embedding for retrieval (Action) while enforcing semantic decodability (Knowledge) through two complementary objectives: (i) history-conditioned semantic generation, which anchors optimization to the LLM's native next-token distribution, and (ii) embedding-conditioned semantic reconstruction, which constrains the interest embedding to remain semantically recoverable. On Taobao search system, KARMA mitigates semantic collapse (attention-sink analysis) and improves both action metrics and semantic fidelity. In ablations, semantic decodability yields up to +22.5 HR@200. With KARMA, we achieve +0.25 CTR AUC in ranking, +1.86 HR in pre-ranking and +2.51 HR in recalling. Deployed online with low inference overhead at ranking stage, KARMA drives +0.5% increase in Item Click.




Abstract:Social recommendation is effective in improving the recommendation performance by leveraging social relations from online social networking platforms. Social relations among users provide friends' information for modeling users' interest in candidate items and help items expose to potential consumers (i.e., item attraction). However, there are two issues haven't been well-studied: Firstly, for the user interests, existing methods typically aggregate friends' information contextualized on the candidate item only, and this shallow context-aware aggregation makes them suffer from the limited friends' information. Secondly, for the item attraction, if the item's past consumers are the friends of or have a similar consumption habit to the targeted user, the item may be more attractive to the targeted user, but most existing methods neglect the relation enhanced context-aware item attraction. To address the above issues, we proposed DICER (Dual Side Deep Context-aware Modulation for SocialRecommendation). Specifically, we first proposed a novel graph neural network to model the social relation and collaborative relation, and on top of high-order relations, a dual side deep context-aware modulation is introduced to capture the friends' information and item attraction. Empirical results on two real-world datasets show the effectiveness of the proposed model and further experiments are conducted to help understand how the dual context-aware modulation works.