Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Recommendation

What is Recommendation? Recommendation is the task of providing personalized suggestions to users based on their preferences and behavior.

Deep Reinforcement Learning for Ranking Utility Tuning in the Ad Recommender System at Pinterest

Sep 05, 2025

Xiao Yang, Mehdi Ben Ayed, Longyu Zhao, Fan Zhou, Yuchen Shen, Abe Engle, Jinfeng Zhuang, Ling Leng, Jiajing Xu, Charles Rosenberg(+1 more)

Abstract:The ranking utility function in an ad recommender system, which linearly combines predictions of various business goals, plays a central role in balancing values across the platform, advertisers, and users. Traditional manual tuning, while offering simplicity and interpretability, often yields suboptimal results due to its unprincipled tuning objectives, the vast amount of parameter combinations, and its lack of personalization and adaptability to seasonality. In this work, we propose a general Deep Reinforcement Learning framework for Personalized Utility Tuning (DRL-PUT) to address the challenges of multi-objective optimization within ad recommender systems. Our key contributions include: 1) Formulating the problem as a reinforcement learning task: given the state of an ad request, we predict the optimal hyperparameters to maximize a pre-defined reward. 2) Developing an approach to directly learn an optimal policy model using online serving logs, avoiding the need to estimate a value function, which is inherently challenging due to the high variance and unbalanced distribution of immediate rewards. We evaluated DRL-PUT through an online A/B experiment in Pinterest's ad recommender system. Compared to the baseline manual utility tuning approach, DRL-PUT improved the click-through rate by 9.7% and the long click-through rate by 7.7% on the treated segment. We conducted a detailed ablation study on the impact of different reward definitions and analyzed the personalization aspect of the learned policy model.

Via

Access Paper or Ask Questions

Temporal Interest-Driven Multimodal Personalized Content Generation

Sep 04, 2025

Tian Miao

Figure 1 for Temporal Interest-Driven Multimodal Personalized Content Generation

Abstract:With the dynamic evolution of user interests and the increasing multimodal demands in internet applications, personalized content generation strategies based on static interest preferences struggle to meet practical application requirements. The proposed TIMGen (Temporal Interest-driven Multimodal Generation) model addresses this challenge by modeling the long-term temporal evolution of users' interests and capturing dynamic interest representations with strong temporal dependencies. This model also supports the fusion of multimodal features, such as text, images, video, and audio, and delivers customized content based on multimodal preferences. TIMGen jointly learns temporal dependencies and modal preferences to obtain a unified interest representation, which it then generates to meet users' personalized content needs. TIMGen overcomes the shortcomings of personalized content recommendation methods based on static preferences, enabling flexible and dynamic modeling of users' multimodal interests, better understanding and capturing their interests and preferences. It can be extended to a variety of practical application scenarios, including e-commerce, advertising, online education, and precision medicine, providing insights for future research.

Via

Access Paper or Ask Questions

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

Sep 04, 2025

Juraj Vladika, Mahdi Dhaini, Florian Matthes

Abstract:The growing capabilities of Large Language Models (LLMs) show significant potential to enhance healthcare by assisting medical researchers and physicians. However, their reliance on static training data is a major risk when medical recommendations evolve with new research and developments. When LLMs memorize outdated medical knowledge, they can provide harmful advice or fail at clinical reasoning tasks. To investigate this problem, we introduce two novel question-answering (QA) datasets derived from systematic reviews: MedRevQA (16,501 QA pairs covering general biomedical knowledge) and MedChangeQA (a subset of 512 QA pairs where medical consensus has changed over time). Our evaluation of eight prominent LLMs on the datasets reveals consistent reliance on outdated knowledge across all models. We additionally analyze the influence of obsolete pre-training data and training strategies to explain this phenomenon and propose future directions for mitigation, laying the groundwork for developing more current and reliable medical AI systems.

* Accepted to Findings of EMNLP 2025

Via

Access Paper or Ask Questions

KubeGuard: LLM-Assisted Kubernetes Hardening via Configuration Files and Runtime Logs Analysis

Sep 04, 2025

Omri Sgan Cohen, Ehud Malul, Yair Meidan, Dudu Mimran, Yuval Elovici, Asaf Shabtai

Abstract:The widespread adoption of Kubernetes (K8s) for orchestrating cloud-native applications has introduced significant security challenges, such as misconfigured resources and overly permissive configurations. Failing to address these issues can result in unauthorized access, privilege escalation, and lateral movement within clusters. Most existing K8s security solutions focus on detecting misconfigurations, typically through static analysis or anomaly detection. In contrast, this paper presents KubeGuard, a novel runtime log-driven recommender framework aimed at mitigating risks by addressing overly permissive configurations. KubeGuard is designed to harden K8s environments through two complementary tasks: Resource Creation and Resource Refinement. It leverages large language models (LLMs) to analyze manifests and runtime logs reflecting actual system behavior, using modular prompt-chaining workflows. This approach enables KubeGuard to create least-privilege configurations for new resources and refine existing manifests to reduce the attack surface. KubeGuard's output manifests are presented as recommendations that users (e.g., developers and operators) can review and adopt to enhance cluster security. Our evaluation demonstrates that KubeGuard effectively generates and refines K8s manifests for Roles, NetworkPolicies, and Deployments, leveraging both proprietary and open-source LLMs. The high precision, recall, and F1-scores affirm KubeGuard's practicality as a framework that translates runtime observability into actionable, least-privilege configuration guidance.

Via

Access Paper or Ask Questions

Unified Representation Learning for Multi-Intent Diversity and Behavioral Uncertainty in Recommender Systems

Sep 04, 2025

Wei Xu, Jiasen Zheng, Junjiang Lin, Mingxuan Han, Junliang Du

Abstract:This paper addresses the challenge of jointly modeling user intent diversity and behavioral uncertainty in recommender systems. A unified representation learning framework is proposed. The framework builds a multi-intent representation module and an uncertainty modeling mechanism. It extracts multi-granularity interest structures from user behavior sequences. Behavioral ambiguity and preference fluctuation are captured using Bayesian distribution modeling. In the multi-intent modeling part, the model introduces multiple latent intent vectors. These vectors are weighted and fused using an attention mechanism to generate semantically rich representations of long-term user preferences. In the uncertainty modeling part, the model learns the mean and covariance of behavior representations through Gaussian distributions. This reflects the user's confidence in different behavioral contexts. Next, a learnable fusion strategy is used to combine long-term intent and short-term behavior signals. This produces the final user representation, improving both recommendation accuracy and robustness. The method is evaluated on standard public datasets. Experimental results show that it outperforms existing representative models across multiple metrics. It also demonstrates greater stability and adaptability under cold-start and behavioral disturbance scenarios. The approach alleviates modeling bottlenecks faced by traditional methods when dealing with complex user behavior. These findings confirm the effectiveness and practical value of the unified modeling strategy in real-world recommendation tasks.

Via

Access Paper or Ask Questions

What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models

Sep 04, 2025

Pierre Le Coz, Jia An Liu, Debarun Bhattacharjya, Georgina Curto, Serge Stinckwich

Abstract:Large language models (LLMs) are increasingly being adopted in high-stakes domains. Their capacity to process vast amounts of unstructured data, explore flexible scenarios, and handle a diversity of contextual factors can make them uniquely suited to provide new insights for the complexity of social policymaking. This article evaluates whether LLMs' are aligned with domain experts (and among themselves) to inform social policymaking on the subject of homelessness alleviation - a challenge affecting over 150 million people worldwide. We develop a novel benchmark comprised of decision scenarios with policy choices across four geographies (South Bend, USA; Barcelona, Spain; Johannesburg, South Africa; Macau SAR, China). The policies in scope are grounded in the conceptual framework of the Capability Approach for human development. We also present an automated pipeline that connects the benchmarked policies to an agent-based model, and we explore the social impact of the recommended policies through simulated social scenarios. The paper results reveal promising potential to leverage LLMs for social policy making. If responsible guardrails and contextual calibrations are introduced in collaboration with local domain experts, LLMs can provide humans with valuable insights, in the form of alternative policies at scale.

Via

Access Paper or Ask Questions

No Thoughts Just AI: Biased LLM Recommendations Limit Human Agency in Resume Screening

Sep 04, 2025

Kyra Wilson, Mattea Sim, Anna-Maria Gueorguieva, Aylin Caliskan

Abstract:In this study, we conduct a resume-screening experiment (N=528) where people collaborate with simulated AI models exhibiting race-based preferences (bias) to evaluate candidates for 16 high and low status occupations. Simulated AI bias approximates factual and counterfactual estimates of racial bias in real-world AI systems. We investigate people's preferences for White, Black, Hispanic, and Asian candidates (represented through names and affinity groups on quality-controlled resumes) across 1,526 scenarios and measure their unconscious associations between race and status using implicit association tests (IATs), which predict discriminatory hiring decisions but have not been investigated in human-AI collaboration. When making decisions without AI or with AI that exhibits no race-based preferences, people select all candidates at equal rates. However, when interacting with AI favoring a particular group, people also favor those candidates up to 90% of the time, indicating a significant behavioral shift. The likelihood of selecting candidates whose identities do not align with common race-status stereotypes can increase by 13% if people complete an IAT before conducting resume screening. Finally, even if people think AI recommendations are low quality or not important, their decisions are still vulnerable to AI bias under certain circumstances. This work has implications for people's autonomy in AI-HITL scenarios, AI and work, design and evaluation of AI hiring systems, and strategies for mitigating bias in collaborative decision-making tasks. In particular, organizational and regulatory policy should acknowledge the complex nature of AI-HITL decision making when implementing these systems, educating people who use them, and determining which are subject to oversight.

* Published in Proceedings of the 2025 AAAI/ACM Conference on AI, Ethics, and Society; code available at https://github.com/kyrawilson/No-Thoughts-Just-AI

Via

Access Paper or Ask Questions

Efficient Item ID Generation for Large-Scale LLM-based Recommendation

Sep 03, 2025

Anushya Subbiah, Vikram Aggarwal, James Pine, Steffen Rendle, Krishna Sayana, Kun Su

Abstract:Integrating product catalogs and user behavior into LLMs can enhance recommendations with broad world knowledge, but the scale of real-world item catalogs, often containing millions of discrete item identifiers (Item IDs), poses a significant challenge. This contrasts with the smaller, tokenized text vocabularies typically used in LLMs. The predominant view within the LLM-based recommendation literature is that it is infeasible to treat item ids as a first class citizen in the LLM and instead some sort of tokenization of an item into multiple tokens is required. However, this creates a key practical bottleneck in serving these models for real-time low-latency applications. Our paper challenges this predominant practice and integrates item ids as first class citizens into the LLM. We provide simple, yet highly effective, novel training and inference modifications that enable single-token representations of items and single-step decoding. Our method shows improvements in recommendation quality (Recall and NDCG) over existing techniques on the Amazon shopping datasets while significantly improving inference efficiency by 5x-14x. Our work offers an efficiency perspective distinct from that of other popular approaches within LLM-based recommendation, potentially inspiring further research and opening up a new direction for integrating IDs into LLMs. Our code is available here https://drive.google.com/file/d/1cUMj37rV0Z1bCWMdhQ6i4q4eTRQLURtC

Via

Access Paper or Ask Questions

LLMs for estimating positional bias in logged interaction data

Sep 03, 2025

Aleksandr V. Petrov, Michael Murtagh, Karthik Nagesh

Abstract:Recommender and search systems commonly rely on Learning To Rank models trained on logged user interactions to order items by predicted relevance. However, such interaction data is often subject to position bias, as users are more likely to click on items that appear higher in the ranking, regardless of their actual relevance. As a result, newly trained models may inherit and reinforce the biases of prior ranking models rather than genuinely improving relevance. A standard approach to mitigate position bias is Inverse Propensity Scoring (IPS), where the model's loss is weighted by the inverse of a propensity function, an estimate of the probability that an item at a given position is examined. However, accurate propensity estimation is challenging, especially in interfaces with complex non-linear layouts. In this paper, we propose a novel method for estimating position bias using Large Language Models (LLMs) applied to logged user interaction data. This approach offers a cost-effective alternative to online experimentation. Our experiments show that propensities estimated with our LLM-as-a-judge approach are stable across score buckets and reveal the row-column effects of Viator's grid layout that simpler heuristics overlook. An IPS-weighted reranker trained with these propensities matches the production model on standard NDCG@10 while improving weighted NDCG@10 by roughly 2%. We will verify these offline gains in forthcoming live-traffic experiments.

* Accepted at the CONSEQUENCES Workshop @ RecSys'25

Via

Access Paper or Ask Questions

ACT: Automated Constraint Targeting for Multi-Objective Recommender Systems

Sep 03, 2025

Daryl Chang, Yi Wu, Jennifer She, Li Wei, Lukasz Heldt

Abstract:Recommender systems often must maximize a primary objective while ensuring secondary ones satisfy minimum thresholds, or "guardrails." This is critical for maintaining a consistent user experience and platform ecosystem, but enforcing these guardrails despite orthogonal system changes is challenging and often requires manual hyperparameter tuning. We introduce the Automated Constraint Targeting (ACT) framework, which automatically finds the minimal set of hyperparameter changes needed to satisfy these guardrails. ACT uses an offline pairwise evaluation on unbiased data to find solutions and continuously retrains to adapt to system and user behavior changes. We empirically demonstrate its efficacy and describe its deployment in a large-scale production environment.

Via

Access Paper or Ask Questions

Topic:Recommendation

Papers and Code