The rise of generative artificial intelligence, particularly Large Language Models (LLMs), has intensified the imperative to scrutinize fairness alongside accuracy. Recent studies have begun to investigate fairness evaluations for LLMs within domains such as recommendations. Given that personalization is an intrinsic aspect of recommendation systems, its incorporation into fairness assessments is paramount. Yet, the degree to which current fairness evaluation frameworks account for personalization remains unclear. Our comprehensive literature review aims to fill this gap by examining how existing frameworks handle fairness evaluations of LLMs, with a focus on the integration of personalization factors. Despite an exhaustive collection and analysis of relevant works, we discovered that most evaluations overlook personalization, a critical facet of recommendation systems, thereby inadvertently perpetuating unfair practices. Our findings shed light on this oversight and underscore the urgent need for more nuanced fairness evaluations that acknowledge personalization. Such improvements are vital for fostering equitable development within the AI community.
In this age where data is abundant, the ability to distill meaningful insights from the sea of information is essential. Our research addresses the computational and resource inefficiencies that current Sequential Recommender Systems (SRSs) suffer from. especially those employing attention-based models like SASRec, These systems are designed for next-item recommendations in various applications, from e-commerce to social networks. However, such systems suffer from substantial computational costs and resource consumption during the inference stage. To tackle these issues, our research proposes a novel method that combines automatic pruning techniques with advanced model architectures. We also explore the potential of resource-constrained Neural Architecture Search (NAS), a technique prevalent in the realm of recommendation systems, to fine-tune models for reduced FLOPs, latency, and energy usage while retaining or even enhancing accuracy. The main contribution of our work is developing the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec). This approach aims to find optimal compact architectures for attention-based SRSs, ensuring accuracy retention. EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network. Additionally, it utilizes a dynamic resource constraint approach, which standardizes the search process and results in more appropriate architectures. The effectiveness of our methodology is validated through exhaustive experiments on three benchmark datasets, which demonstrates EASRec's superiority in SRSs. Our research set a new standard for future exploration into efficient and accurate recommender systems, signifying a substantial advancement within this swiftly advancing field.
Modern recommender systems may output considerably different recommendations due to small perturbations in the training data. Changes in the data from a single user will alter the recommendations as well as the recommendations of other users. In applications like healthcare, housing, and finance, this sensitivity can have adverse effects on user experience. We propose a method to stabilize a given recommender system against such perturbations. This is a challenging task due to (1) the lack of a ``reference'' rank list that can be used to anchor the outputs; and (2) the computational challenges in ensuring the stability of rank lists with respect to all possible perturbations of training data. Our method, FINEST, overcomes these challenges by obtaining reference rank lists from a given recommendation model and then fine-tuning the model under simulated perturbation scenarios with rank-preserving regularization on sampled items. Our experiments on real-world datasets demonstrate that FINEST can ensure that recommender models output stable recommendations under a wide range of different perturbations without compromising next-item prediction accuracy.
Predicting Click-Through Rate (CTR) in billion-scale recommender systems poses a long-standing challenge for Graph Neural Networks (GNNs) due to the overwhelming computational complexity involved in aggregating billions of neighbors. To tackle this, GNN-based CTR models usually sample hundreds of neighbors out of the billions to facilitate efficient online recommendations. However, sampling only a small portion of neighbors results in a severe sampling bias and the failure to encompass the full spectrum of user or item behavioral patterns. To address this challenge, we name the conventional user-item recommendation graph as "micro recommendation graph" and introduce a more suitable MAcro Recommendation Graph (MAG) for billion-scale recommendations. MAG resolves the computational complexity problems in the infrastructure by reducing the node count from billions to hundreds. Specifically, MAG groups micro nodes (users and items) with similar behavior patterns to form macro nodes. Subsequently, we introduce tailored Macro Graph Neural Networks (MacGNN) to aggregate information on a macro level and revise the embeddings of macro nodes. MacGNN has already served Taobao's homepage feed for two months, providing recommendations for over one billion users. Extensive offline experiments on three public benchmark datasets and an industrial dataset present that MacGNN significantly outperforms twelve CTR baselines while remaining computationally efficient. Besides, online A/B tests confirm MacGNN's superiority in billion-scale recommender systems.
Cloud-based large language models (LLMs) such as ChatGPT have increasingly become integral to daily operations, serving as vital tools across various applications. While these models offer substantial benefits in terms of accessibility and functionality, they also introduce significant privacy concerns: the transmission and storage of user data in cloud infrastructures pose substantial risks of data breaches and unauthorized access to sensitive information; even if the transmission and storage of data is encrypted, the LLM service provider itself still knows the real contents of the data, preventing individuals or entities from confidently using such LLM services. To address these concerns, this paper proposes a simple yet effective mechanism PromptCrypt to protect user privacy. It uses Emoji to encrypt the user inputs before sending them to LLM, effectively rendering them indecipherable to human or LLM's examination while retaining the original intent of the prompt, thus ensuring the model's performance remains unaffected. We conduct experiments on three tasks, personalized recommendation, sentiment analysis, and tabular data analysis. Experiment results reveal that PromptCrypt can encrypt personal information within prompts in such a manner that not only prevents the discernment of sensitive data by humans or LLM itself, but also maintains or even improves the precision without further tuning, achieving comparable or even better task accuracy than directly prompting the LLM without prompt encryption. These results highlight the practicality of adopting encryption measures that safeguard user privacy without compromising the functional integrity and performance of LLMs. Code and dataset are available at https://github.com/agiresearch/PromptCrypt.
Computation of document similarity is a critical task in various NLP domains that has applications in deduplication, matching, and recommendation. Traditional approaches for document similarity computation include learning representations of documents and employing a similarity or a distance function over the embeddings. However, pairwise similarities and differences are not efficiently captured by individual representations. Graph representations such as Joint Concept Interaction Graph (JCIG) represent a pair of documents as a joint undirected weighted graph. JCIGs facilitate an interpretable representation of document pairs as a graph. However, JCIGs are undirected, and don't consider the sequential flow of sentences in documents. We propose two approaches to model document similarity by representing document pairs as a directed and sparse JCIG that incorporates sequential information. We propose two algorithms inspired by Supergenome Sorting and Hamiltonian Path that replace the undirected edges with directed edges. Our approach also sparsifies the graph to $O(n)$ edges from JCIG's worst case of $O(n^2)$. We show that our sparse directed graph model architecture consisting of a Siamese encoder and GCN achieves comparable results to the baseline on datasets not containing sequential information and beats the baseline by ten points on an instructional documents dataset containing sequential information.
Relationship inference from sparse data is an important task with applications ranging from product recommendation to drug discovery. A recently proposed linear model for sparse matrix completion has demonstrated surprising advantage in speed and accuracy over more sophisticated recommender systems algorithms. Here we extend the linear model to develop a shallow autoencoder for the dual neighborhood-regularized matrix completion problem. We demonstrate the speed and accuracy advantage of our approach over the existing state-of-the-art in predicting drug-target interactions and drug-disease associations.
This study explores the nuanced capabilities and inherent biases of Recommender Systems using Large Language Models (RecLLMs), with a focus on ChatGPT-based systems. It studies into the contrasting behaviors of generative models and traditional collaborative filtering models in movie recommendations. The research primarily investigates prompt design strategies and their impact on various aspects of recommendation quality, including accuracy, provider fairness, diversity, stability, genre dominance, and temporal freshness (recency). Our experimental analysis reveals that the introduction of specific 'system roles' and 'prompt strategies' in RecLLMs significantly influences their performance. For instance, role-based prompts enhance fairness and diversity in recommendations, mitigating popularity bias. We find that while GPT-based models do not always match the performance of CF baselines, they exhibit a unique tendency to recommend newer and more diverse movie genres. Notably, GPT-based models tend to recommend more recent films, particularly those released post-2000, and show a preference for genres like \sq{Drama} and Comedy, and Romance (compared to CF Action, Adventure) presumably due to the RecLLMs' training on varied data sets, which allows them to capture recent trends and discussions more effectively than CF models. Interestingly, our results demonstrate that the 'Simple' and 'Chain of Thought (COT)' paradigms yield the highest accuracy. These findings imply the potential of combining these strategies with scenarios that favor more recent content, thereby offering a more balanced and up-to-date recommendation experience. This study contributes significantly to the understanding of emerging RecLLMs, particularly in the context of harms and biases within these systems.
Multi-interest learning method for sequential recommendation aims to predict the next item according to user multi-faceted interests given the user historical interactions. Existing methods mainly consist of two modules: the multi-interest extraction module that learns user multi-interest embeddings to capture the user multi-interests, and the multi-interest weight prediction module that learns the weight of each interest for aggregating the learned multi-interest embeddings to derive the user embedding, used for predicting the user rating to an item. Despite their effectiveness, existing methods have two key limitations: 1) they directly feed the user interactions into the two modules, while ignoring their different learning objectives, and 2) they merely consider the centrality of the user interactions to learn the user multi-interests, while overlooking their dispersion. To tackle these limitations, we propose a prompt-based multi-interest learning method (PoMRec), where specific prompts are inserted into user interactions to make them adaptive to different learning objectives of the two modules. Moreover, we utilize both the mean and variance embeddings of user interactions to derive the user multi-interest embeddings for comprehensively model the user multi-interests. We conduct extensive experiments on two public datasets, and the results verify that our proposed PoMRec outperforms the state-of-the-art multi-interest learning methods.
Large language models (LLMs) have demonstrated remarkable capabilities and have been extensively deployed across various domains, including recommender systems. Numerous studies have employed specialized \textit{prompts} to harness the in-context learning capabilities intrinsic to LLMs. For example, LLMs are prompted to act as zero-shot rankers for listwise ranking, evaluating candidate items generated by a retrieval model for recommendation. Recent research further uses instruction tuning techniques to align LLM with human preference for more promising recommendations. Despite its potential, current research overlooks the integration of multiple ranking tasks to enhance model performance. Moreover, the signal from the conventional recommendation model is not integrated into the LLM, limiting the current system performance. In this paper, we introduce RecRanker, tailored for instruction tuning LLM to serve as the \textbf{Ranker} for top-\textit{k} \textbf{Rec}ommendations. Specifically, we introduce importance-aware sampling, clustering-based sampling, and penalty for repetitive sampling for sampling high-quality, representative, and diverse training data. To enhance the prompt, we introduce position shifting strategy to mitigate position bias and augment the prompt with auxiliary information from conventional recommendation models, thereby enriching the contextual understanding of the LLM. Subsequently, we utilize the sampled data to assemble an instruction-tuning dataset with the augmented prompt comprising three distinct ranking tasks: pointwise, pairwise, and listwise rankings. We further propose a hybrid ranking method to enhance the model performance by ensembling these ranking tasks. Our empirical evaluations demonstrate the effectiveness of our proposed RecRanker in both direct and sequential recommendation scenarios.