Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julian McAuley

Foundation Models for Recommender Systems: A Survey and New Perspectives

Feb 17, 2024

Chengkai Huang, Tong Yu, Kaige Xie, Shuai Zhang, Lina Yao, Julian McAuley

Figure 1 for Foundation Models for Recommender Systems: A Survey and New Perspectives

Figure 2 for Foundation Models for Recommender Systems: A Survey and New Perspectives

Figure 3 for Foundation Models for Recommender Systems: A Survey and New Perspectives

Figure 4 for Foundation Models for Recommender Systems: A Survey and New Perspectives

Abstract:Recently, Foundation Models (FMs), with their extensive knowledge bases and complex architectures, have offered unique opportunities within the realm of recommender systems (RSs). In this paper, we attempt to thoroughly examine FM-based recommendation systems (FM4RecSys). We start by reviewing the research background of FM4RecSys. Then, we provide a systematic taxonomy of existing FM4RecSys research works, which can be divided into four different parts including data characteristics, representation learning, model type, and downstream tasks. Within each part, we review the key recent research developments, outlining the representative models and discussing their characteristics. Moreover, we elaborate on the open problems and opportunities of FM4RecSys aiming to shed light on future research directions in this area. In conclusion, we recap our findings and discuss the emerging trends in this field.

Via

Access Paper or Ask Questions

How to Train Data-Efficient LLMs

Feb 15, 2024

Noveen Sachdeva, Benjamin Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong, Ed H. Chi, James Caverlee, Julian McAuley, Derek Zhiyuan Cheng

Figure 1 for How to Train Data-Efficient LLMs

Figure 2 for How to Train Data-Efficient LLMs

Figure 3 for How to Train Data-Efficient LLMs

Figure 4 for How to Train Data-Efficient LLMs

Abstract:The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensive-to-compute data-quality estimates, and (ii) maximization of coverage and diversity-based measures in the feature space. Our first technique, Ask-LLM, leverages the zero-shot reasoning capabilities of instruction-tuned LLMs to directly assess the quality of a training example. To target coverage, we propose Density sampling, which models the data distribution to select a diverse sample. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories. Coverage sampling can recover the performance of the full data, while models trained on Ask-LLM data consistently outperform full-data training -- even when we reject 90% of the original dataset, while converging up to 70% faster.

* Under review. 44 pages, 30 figures

Via

Access Paper or Ask Questions

InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Feb 13, 2024

Jianing Wang, Junda Wu, Yupeng Hou, Yao Liu, Ming Gao, Julian McAuley

Figure 1 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Figure 2 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Figure 3 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Figure 4 for InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment

Abstract:Do current large language models (LLMs) better solve graph reasoning and generation tasks with parameter updates? In this paper, we propose InstructGraph, a framework that empowers LLMs with the abilities of graph reasoning and generation by instruction tuning and preference alignment. Specifically, we first propose a structured format verbalizer to unify all graph data into a universal code-like format, which can simply represent the graph without any external graph-specific encoders. Furthermore, a graph instruction tuning stage is introduced to guide LLMs in solving graph reasoning and generation tasks. Finally, we identify potential hallucination problems in graph tasks and sample negative instances for preference alignment, the target of which is to enhance the output's reliability of the model. Extensive experiments across multiple graph-centric tasks exhibit that InstructGraph can achieve the best performance and outperform GPT-4 and LLaMA2 by more than 13\% and 38\%, respectively.

* 19 pages

Via

Access Paper or Ask Questions

MEMORYLLM: Towards Self-Updatable Large Language Models

Feb 07, 2024

Yu Wang, Xiusi Chen, Jingbo Shang, Julian McAuley

Abstract:Existing Large Language Models (LLMs) usually remain static after deployment, which might make it hard to inject new knowledge into the model. We aim to build models containing a considerable portion of self-updatable parameters, enabling the model to integrate new knowledge effectively and efficiently. To this end, we introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memory pool within the latent space of the transformer. MEMORYLLM can self-update with text knowledge and memorize the knowledge injected earlier. Our evaluations demonstrate the ability of MEMORYLLM to effectively incorporate new knowledge, as evidenced by its performance on model editing benchmarks. Meanwhile, the model exhibits long-term information retention capacity, which is validated through our custom-designed evaluations and long-context benchmarks. MEMORYLLM also shows operational integrity without any sign of performance degradation even after nearly a million memory updates.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

FINEST: Stabilizing Recommendations by Rank-Preserving Fine-Tuning

Feb 05, 2024

Sejoon Oh, Berk Ustun, Julian McAuley, Srijan Kumar

Figure 1 for FINEST: Stabilizing Recommendations by Rank-Preserving Fine-Tuning

Figure 2 for FINEST: Stabilizing Recommendations by Rank-Preserving Fine-Tuning

Figure 3 for FINEST: Stabilizing Recommendations by Rank-Preserving Fine-Tuning

Figure 4 for FINEST: Stabilizing Recommendations by Rank-Preserving Fine-Tuning

Abstract:Modern recommender systems may output considerably different recommendations due to small perturbations in the training data. Changes in the data from a single user will alter the recommendations as well as the recommendations of other users. In applications like healthcare, housing, and finance, this sensitivity can have adverse effects on user experience. We propose a method to stabilize a given recommender system against such perturbations. This is a challenging task due to (1) the lack of a ``reference'' rank list that can be used to anchor the outputs; and (2) the computational challenges in ensuring the stability of rank lists with respect to all possible perturbations of training data. Our method, FINEST, overcomes these challenges by obtaining reference rank lists from a given recommendation model and then fine-tuning the model under simulated perturbation scenarios with rank-preserving regularization on sampled items. Our experiments on real-world datasets demonstrate that FINEST can ensure that recommender models output stable recommendations under a wide range of different perturbations without compromising next-item prediction accuracy.

* Accepted at the 6th FAccTRec Workshop on Responsible Recommendation @ ACM RecSys 2023

Via

Access Paper or Ask Questions

InfoRank: Unbiased Learning-to-Rank via Conditional Mutual Information Minimization

Jan 23, 2024

Jiarui Jin, Zexue He, Mengyue Yang, Weinan Zhang, Yong Yu, Jun Wang, Julian McAuley

Figure 1 for InfoRank: Unbiased Learning-to-Rank via Conditional Mutual Information Minimization

Figure 2 for InfoRank: Unbiased Learning-to-Rank via Conditional Mutual Information Minimization

Figure 3 for InfoRank: Unbiased Learning-to-Rank via Conditional Mutual Information Minimization

Figure 4 for InfoRank: Unbiased Learning-to-Rank via Conditional Mutual Information Minimization

Abstract:Ranking items regarding individual user interests is a core technique of multiple downstream tasks such as recommender systems. Learning such a personalized ranker typically relies on the implicit feedback from users' past click-through behaviors. However, collected feedback is biased toward previously highly-ranked items and directly learning from it would result in a "rich-get-richer" phenomenon. In this paper, we propose a simple yet sufficient unbiased learning-to-rank paradigm named InfoRank that aims to simultaneously address both position and popularity biases. We begin by consolidating the impacts of those biases into a single observation factor, thereby providing a unified approach to addressing bias-related issues. Subsequently, we minimize the mutual information between the observation estimation and the relevance estimation conditioned on the input features. By doing so, our relevance estimation can be proved to be free of bias. To implement InfoRank, we first incorporate an attention mechanism to capture latent correlations within user-item features, thereby generating estimations of observation and relevance. We then introduce a regularization term, grounded in conditional mutual information, to promote conditional independence between relevance estimation and observation estimation. Experimental evaluations conducted across three extensive recommendation and search datasets reveal that InfoRank learns more precise and unbiased ranking strategies.

* WWW 2024

Via

Access Paper or Ask Questions

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Jan 22, 2024

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

Figure 1 for DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Figure 2 for DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Figure 3 for DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Figure 4 for DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Abstract:We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control - all without ever fine-tuning the underlying model. When we compare our approach against related training, guidance, and optimization-based methods, we find DITTO achieves state-of-the-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efficiency, thus opening the door for high-quality, flexible, training-free control of diffusion models. Sound examples can be found at https://DITTO-Music.github.io/web/.

Via

Access Paper or Ask Questions

Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation

Dec 17, 2023

Yu Wang, Zexue He, Zhankui He, Hao Xu, Julian McAuley

Figure 1 for Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation

Figure 2 for Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation

Figure 3 for Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation

Figure 4 for Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation

Abstract:Understanding and accurately explaining compatibility relationships between fashion items is a challenging problem in the burgeoning domain of AI-driven outfit recommendations. Present models, while making strides in this area, still occasionally fall short, offering explanations that can be elementary and repetitive. This work aims to address these shortcomings by introducing the Pair Fashion Explanation (PFE) dataset, a unique resource that has been curated to illuminate these compatibility relationships. Furthermore, we propose an innovative two-stage pipeline model that leverages this dataset. This fine-tuning allows the model to generate explanations that convey the compatibility relationships between items. Our experiments showcase the model's potential in crafting descriptions that are knowledgeable, aligned with ground-truth matching correlations, and that produce understandable and informative descriptions, as assessed by both automatic metrics and human evaluation. Our code and data are released at https://github.com/wangyu-ustc/PairFashionExplanation

* AAAI 2024

Via

Access Paper or Ask Questions

Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

Nov 21, 2023

Weihan Xu, Julian McAuley, Shlomo Dubnov, Hao-Wen Dong

Figure 1 for Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

Figure 2 for Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

Figure 3 for Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

Figure 4 for Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

Abstract:The ''pretraining-and-finetuning'' paradigm has become a norm for training domain-specific models in natural language processing and computer vision. In this work, we aim to examine this paradigm for symbolic music generation through leveraging the largest ever symbolic music dataset sourced from the MuseScore forum. We first pretrain a large unconditional transformer model using 1.5 million songs. We then propose a simple technique to equip this pretrained unconditional music transformer model with instrument and genre controls by finetuning the model with additional control tokens. Our proposed representation offers improved high-level controllability and expressiveness against two existing representations. The experimental results show that the proposed model can successfully generate music with user-specified instruments and genre. In a subjective listening test, the proposed model outperforms the pretrained baseline model in terms of coherence, harmony, arrangement and overall quality.

Via

Access Paper or Ask Questions

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Nov 13, 2023

An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao(+2 more)

Figure 1 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Figure 2 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Figure 3 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Figure 4 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Abstract:We present MM-Navigator, a GPT-4V-based agent for the smartphone graphical user interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as human users, and determine subsequent actions to fulfill given instructions. Our findings demonstrate that large multimodal models (LMMs), specifically GPT-4V, excel in zero-shot GUI navigation through its advanced screen interpretation, action reasoning, and precise action localization capabilities. We first benchmark MM-Navigator on our collected iOS screen dataset. According to human assessments, the system exhibited a 91\% accuracy rate in generating reasonable action descriptions and a 75\% accuracy rate in executing the correct actions for single-step instructions on iOS. Additionally, we evaluate the model on a subset of an Android screen navigation dataset, where the model outperforms previous GUI navigators in a zero-shot fashion. Our benchmark and detailed analyses aim to lay a robust groundwork for future research into the GUI navigation task. The project page is at https://github.com/zzxslp/MM-Navigator.

* Work in progress

Via

Access Paper or Ask Questions