Alert button

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

Apr 19, 2023
Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, Zhaochun Ren

Figure 1 for Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
Figure 2 for Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
Figure 3 for Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
Figure 4 for Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

Share this with someone who'll enjoy it:

Large Language Models (LLMs) have demonstrated a remarkable ability to generalize zero-shot to various language-related tasks. This paper focuses on the study of exploring generative LLMs such as ChatGPT and GPT-4 for relevance ranking in Information Retrieval (IR). Surprisingly, our experiments reveal that properly instructed ChatGPT and GPT-4 can deliver competitive, even superior results than supervised methods on popular IR benchmarks. Notably, GPT-4 outperforms the fully fine-tuned monoT5-3B on MS MARCO by an average of 2.7 nDCG on TREC datasets, an average of 2.3 nDCG on eight BEIR datasets, and an average of 2.7 nDCG on ten low-resource languages Mr.TyDi. Subsequently, we delve into the potential for distilling the ranking capabilities of ChatGPT into a specialized model. Our small specialized model that trained on 10K ChatGPT generated data outperforms monoT5 trained on 400K annotated MS MARCO data on BEIR. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT

View paper onarxiv icon

Share this with someone who'll enjoy it: