Alert button
Picture for Keqin Bao

Keqin Bao

Alert button

CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation

Oct 30, 2023
Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, Xiangnan He

Leveraging Large Language Models as Recommenders (LLMRec) has gained significant attention and introduced fresh perspectives in user preference modeling. Existing LLMRec approaches prioritize text semantics, usually neglecting the valuable collaborative information from user-item interactions in recommendations. While these text-emphasizing approaches excel in cold-start scenarios, they may yield sub-optimal performance in warm-start situations. In pursuit of superior recommendations for both cold and warm start scenarios, we introduce CoLLM, an innovative LLMRec methodology that seamlessly incorporates collaborative information into LLMs for recommendation. CoLLM captures collaborative information through an external traditional model and maps it to the input token embedding space of LLM, forming collaborative embeddings for LLM usage. Through this external integration of collaborative information, CoLLM ensures effective modeling of collaborative information without modifying the LLM itself, providing the flexibility to employ various collaborative information modeling techniques. Extensive experiments validate that CoLLM adeptly integrates collaborative information into LLMs, resulting in enhanced recommendation performance. We release the code and data at https://github.com/zyang1580/CoLLM.

Viaarxiv icon

A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems

Aug 16, 2023
Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yancheng Luo, Fuli Feng, Xiangnaan He, Qi Tian

As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in augmenting their effectiveness in providing recommendations. However, existing approaches for LLM4Rec often assess performance using restricted sets of candidates, which may not accurately reflect the models' overall ranking capabilities. In this paper, our objective is to investigate the comprehensive ranking capacity of LLMs and propose a two-step grounding framework known as BIGRec (Bi-step Grounding Paradigm for Recommendation). It initially grounds LLMs to the recommendation space by fine-tuning them to generate meaningful tokens for items and subsequently identifies appropriate actual items that correspond to the generated tokens. By conducting extensive experiments on two datasets, we substantiate the superior performance, capacity for handling few-shot scenarios, and versatility across multiple domains exhibited by BIGRec. Furthermore, we observe that the marginal benefits derived from increasing the quantity of training samples are modest for BIGRec, implying that LLMs possess the limited capability to assimilate statistical information, such as popularity and collaborative filtering, due to their robust semantic priors. These findings also underline the efficacy of integrating diverse statistical information into the LLM4Rec framework, thereby pointing towards a potential avenue for future research. Our code and data are available at https://github.com/SAI990323/Grounding4Rec.

* 17 pages 
Viaarxiv icon

Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation

May 12, 2023
Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

Figure 1 for Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
Figure 2 for Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
Figure 3 for Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
Figure 4 for Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation

The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm -- Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM.

* 10 pages 
Viaarxiv icon

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

Apr 30, 2023
Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

Figure 1 for TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
Figure 2 for TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
Figure 3 for TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
Figure 4 for TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains, thereby prompting researchers to explore their potential for use in recommendation systems. Initial attempts have leveraged the exceptional capabilities of LLMs, such as rich knowledge and strong generalization through In-context Learning, which involves phrasing the recommendation task as prompts. Nevertheless, the performance of LLMs in recommendation tasks remains suboptimal due to a substantial disparity between the training tasks for LLMs and recommendation tasks, as well as inadequate recommendation data during pre-training. To bridge the gap, we consider building a Large Recommendation Language Model by tunning LLMs with recommendation data. To this end, we propose an efficient and effective Tuning framework for Aligning LLMs with Recommendation, namely TALLRec. We have demonstrated that the proposed TALLRec framework can significantly enhance the recommendation capabilities of LLMs in the movie and book domains, even with a limited dataset of fewer than 100 samples. Additionally, the proposed framework is highly efficient and can be executed on a single RTX 3090 with LLaMA-7B. Furthermore, the fine-tuned LLM exhibits robust cross-domain generalization. Our code and data are available at https://github.com/SAI990323/TALLRec.

* Technical Report 
Viaarxiv icon

Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors

Feb 17, 2023
Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie

Figure 1 for Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors
Figure 2 for Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors
Figure 3 for Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors
Figure 4 for Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors

Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs. Besides, we build an FG-TED model to predict the \textbf{addition} and \textbf{omission} errors -- two typical translation accuracy errors. First, we use a word-level classification paradigm to form our model and use the shortcut learning reduction to relieve the influence of monolingual features. Besides, we construct synthetic datasets for model training, and relieve the disagreement of data labeling in authoritative datasets, making the experimental benchmark concordant. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results on the restored dataset. Our model also delivers more reliable predictions on low-resource and transfer scenarios than existing baselines. The related datasets and the source code will be released in the future.

Viaarxiv icon

Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task

Oct 18, 2022
Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie

Figure 1 for Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task
Figure 2 for Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task
Figure 3 for Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task

In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation). Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. First, we apply the pseudo-labeled data examples for the continuously pre-training phase. Notably, to reduce the gap between pre-training and fine-tuning, we use data pruning and a ranking-based score normalization strategy. For the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions. Finally, we collect the source-only evaluation results, and ensemble the predictions generated by two UniTE models, whose backbones are XLM-R and InfoXLM, respectively. Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings, showing relatively strong performances in this year's quality estimation competition.

* WMT 2022 QE Shared Task. arXiv admin note: text overlap with arXiv:2210.09683 
Viaarxiv icon

Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task

Oct 18, 2022
Yu Wan, Keqin Bao, Dayiheng Liu, Baosong Yang, Derek F. Wong, Lidia S. Chao, Wenqiang Lei, Jun Xie

Figure 1 for Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task

In this report, we present our submission to the WMT 2022 Metrics Shared Task. We build our system based on the core idea of UNITE (Unified Translation Evaluation), which unifies source-only, reference-only, and source-reference-combined evaluation scenarios into one single model. Specifically, during the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre-train UNITE. Notably, to reduce the gap between pre-training and fine-tuning, we use data cropping and a ranking-based score normalization strategy. During the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions. Specially, we collect the results from models with different pre-trained language model backbones, and use different ensembling strategies for involved translation directions.

* WMT 2022 Metrics Shared Task 
Viaarxiv icon