Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Apr 12, 2022

Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu(+3 more)

Figure 1 for Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Figure 2 for Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Figure 3 for Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Figure 4 for Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Share this with someone who'll enjoy it:

Abstract:To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR). However, there still remains a large discrepancy between the provided upstream signals and the downstream question-passage relevance, which leads to less improvement. To bridge this gap, we propose the HyperLink-induced Pre-training (HLP), a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents. We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training that better facilitate downstream passage retrieval. We investigate the effectiveness of our approach across a wide range of open-domain QA datasets under zero-shot, few-shot, multi-hop, and out-of-domain scenarios. The experiments show our HLP outperforms the BM25 by up to 7 points as well as other pre-training methods by more than 10 points in terms of top-20 retrieval accuracy under the zero-shot scenario. Furthermore, HLP significantly outperforms other pre-training methods under the other scenarios.

* Accepted by ACL 2022 main conference; The dataset and code are available at https://github.com/jzhoubu/HLP

View paper on

Share this with someone who'll enjoy it:

Title:Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Paper and Code