Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Mar 01, 2022

Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Ji-Rong Wen

Figure 1 for DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Figure 2 for DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Figure 3 for DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Figure 4 for DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Share this with someone who'll enjoy it:

Abstract:Web search provides a promising way for people to obtain information and has been extensively studied. With the surgence of deep learning and large-scale pre-training techniques, various neural information retrieval models are proposed and they have demonstrated the power for improving search (especially, the ranking) quality. All these existing search methods follow a common paradigm, i.e. index-retrieve-rerank, where they first build an index of all documents based on document terms (i.e., sparse inverted index) or representation vectors (i.e., dense vector index), then retrieve and rerank retrieved documents based on similarity between the query and documents via ranking models. In this paper, we explore a new paradigm of information retrieval with neither sparse nor dense index but only a model. Specifically, we propose a pre-training model-based IR system called DynamicRetriever. As for this system, the training stage embeds the token-level and document-level information (especially, document identifiers) of the corpus into the model parameters, then the inference stage directly generates document identifiers for a given query. Compared with existing search methods, the model-based IR system has two advantages: i) it parameterizes the traditional static index with a pre-training model, which converts the document semantic mapping into a dynamic and updatable process; ii) with separate document identifiers, it captures both the term-level and document-level information for each document. Extensive experiments conducted on the public search benchmark MS MARCO verify the effectiveness and potential of our proposed new paradigm for information retrieval.

View paper on

Share this with someone who'll enjoy it:

Title:DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Paper and Code