Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minghan Li

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction

Feb 13, 2023

Xinyu Zhang, Minghan Li, Jimmy Lin

Abstract:Recent progress in information retrieval finds that embedding query and document representation into multi-vector yields a robust bi-encoder retriever on out-of-distribution datasets. In this paper, we explore whether late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [CLS] vector to compute the similarity score. Although intuitively, the attention mechanism of rerankers at the previous layers already gathers the token-level information, we find adding late interaction still brings an extra 5% improvement in average on out-of-distribution datasets, with little increase in latency and no degradation in in-domain effectiveness. Through extensive experiments and analysis, we show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures and that the improvement is more prominent on longer queries.

Via

Access Paper or Ask Questions

Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

Dec 13, 2022

Minghan Li, Eric Gaussier

Figure 1 for Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

Figure 2 for Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

Figure 3 for Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

Figure 4 for Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

Abstract:Although neural information retrieval has witnessed great improvements, recent works showed that the generalization ability of dense retrieval models on target domains with different distributions is limited, which contrasts with the results obtained with interaction-based models. To address this issue, researchers have resorted to adversarial learning and query generation approaches; both approaches nevertheless resulted in limited improvements. In this paper, we propose to use a self-supervision approach in which pseudo-relevance labels are automatically generated on the target domain. To do so, we first use the standard BM25 model on the target domain to obtain a first ranking of documents, and then use the interaction-based model T53B to re-rank top documents. We further combine this approach with knowledge distillation relying on an interaction-based teacher model trained on the source domain. Our experiments reveal that pseudo-relevance labeling using T53B and the MiniLM teacher performs on average better than other approaches and helps improve the state-of-the-art query generation approach GPL when it is fine-tuned on the pseudo-relevance labeled data.

* 16 pages

Via

Access Paper or Ask Questions

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

Nov 18, 2022

Minghan Li, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

Abstract:Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval. CITADEL learns to route different token vectors to the predicted lexical ``keys'' such that a query token vector only interacts with document token vectors routed to the same key. This design significantly reduces the computation cost while maintaining high accuracy. Notably, CITADEL achieves the same or slightly better performance than the previous state of the art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR) evaluations, while being nearly 40 times faster. Code and data are available at https://github.com/facebookresearch/dpr-scale.

Via

Access Paper or Ask Questions

Query Expansion Using Contextual Clue Sampling with Language Models

Oct 13, 2022

Linqing Liu, Minghan Li, Jimmy Lin, Sebastian Riedel, Pontus Stenetorp

Figure 1 for Query Expansion Using Contextual Clue Sampling with Language Models

Figure 2 for Query Expansion Using Contextual Clue Sampling with Language Models

Figure 3 for Query Expansion Using Contextual Clue Sampling with Language Models

Figure 4 for Query Expansion Using Contextual Clue Sampling with Language Models

Abstract:Query expansion is an effective approach for mitigating vocabulary mismatch between queries and documents in information retrieval. One recent line of research uses language models to generate query-related contexts for expansion. Along this line, we argue that expansion terms from these contexts should balance two key aspects: diversity and relevance. The obvious way to increase diversity is to sample multiple contexts from the language model. However, this comes at the cost of relevance, because there is a well-known tendency of models to hallucinate incorrect or irrelevant contexts. To balance these two considerations, we propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR, while reducing the index size by more than 96%. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.

Via

Access Paper or Ask Questions

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Jul 31, 2022

Sheng-Chieh Lin, Minghan Li, Jimmy Lin

Figure 1 for Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Figure 2 for Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Figure 3 for Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Figure 4 for Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Abstract:Pre-trained transformers has declared its success in many NLP tasks. One thread of work focuses on training bi-encoder models (i.e., dense retrievers) to effectively encode sentences or passages into single-vector dense vectors for efficient approximate nearest neighbor (ANN) search. However, recent work has demonstrated that transformers pre-trained with mask language modeling (MLM) are not capable of effectively aggregating text information into a single dense vector due to task-mismatch between pre-training and fine-tuning. Therefore, computationally expensive techniques have been adopted to train dense retrievers, such as large batch size, knowledge distillation or post pre-training. In this work, we present a simple approach to effectively aggregate textual representation from the pre-trained transformer into a dense vector. Extensive experiments show that our approach improves the robustness of the single-vector approach under both in-domain and zero-shot evaluations without any computationally expensive training techniques. Our work demonstrates that MLM pre-trained transformers can be used to effectively encode text information into a single-vector for dense retrieval. Code are available at: https://github.com/castorini/dhr

* 12 pages

Via

Access Paper or Ask Questions

Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

May 19, 2022

Minghan Li, Xinyu Zhang, Ji Xin, Hongyang Zhang, Jimmy Lin

Figure 1 for Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

Figure 2 for Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

Figure 3 for Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

Figure 4 for Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

Abstract:In information retrieval (IR), candidate set pruning has been commonly used to speed up two-stage relevance ranking. However, such an approach lacks accurate error control and often trades accuracy off against computational efficiency in an empirical fashion, lacking theoretical guarantees. In this paper, we propose the concept of certified error control of candidate set pruning for relevance ranking, which means that the test error after pruning is guaranteed to be controlled under a user-specified threshold with high probability. Both in-domain and out-of-domain experiments show that our method successfully prunes the first-stage retrieved candidate sets to improve the second-stage reranking speed while satisfying the pre-specified accuracy constraints in both settings. For example, on MS MARCO Passage v1, our method yields an average candidate set size of 27 out of 1,000 which increases the reranking speed by about 37 times, while the MRR@10 is greater than a pre-specified value of 0.38 with about 90% empirical coverage and the empirical baselines fail to provide such guarantee. Code and data are available at: https://github.com/alexlimh/CEC-Ranking.

Via

Access Paper or Ask Questions

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

Mar 25, 2022

Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, Lei Zhang

Figure 1 for Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

Figure 2 for Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

Figure 3 for Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

Figure 4 for Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

Abstract:Arbitrary style transfer (AST) and domain generalization (DG) are important yet challenging visual learning tasks, which can be cast as a feature distribution matching problem. With the assumption of Gaussian feature distribution, conventional feature distribution matching methods usually match the mean and standard deviation of features. However, the feature distributions of real-world data are usually much more complicated than Gaussian, which cannot be accurately matched by using only the first-order and second-order statistics, while it is computationally prohibitive to use high-order statistics for distribution matching. In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space. Particularly, a fast EHM algorithm, named Sort-Matching, is employed to perform EFDM in a plug-and-play manner with minimal cost. The effectiveness of our proposed EFDM method is verified on a variety of AST and DG tasks, demonstrating new state-of-the-art results. Codes are available at https://github.com/YBZh/EFDM.

* CVPR2022 camera ready
* To appear in CVPR2022; codes and supplementary material are available at: https://github.com/YBZh/EFDM

Via

Access Paper or Ask Questions

One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out

Mar 12, 2022

Minghan Li, Lei Zhang

Figure 1 for One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out

Figure 2 for One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out

Figure 3 for One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out

Figure 4 for One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out

Abstract:Many video instance segmentation (VIS) methods partition a video sequence into individual frames to detect and segment objects frame by frame. However, such a frame-in frame-out (FiFo) pipeline is ineffective to exploit the temporal information. Based on the fact that adjacent frames in a short clip are highly coherent in content, we propose to extend the one-stage FiFo framework to a clip-in clip-out (CiCo) one, which performs VIS clip by clip. Specifically, we stack FPN features of all frames in a short video clip to build a spatio-temporal feature cube, and replace the 2D conv layers in the prediction heads and the mask branch with 3D conv layers, forming clip-level prediction heads (CPH) and clip-level mask heads (CMH). Then the clip-level masks of an instance can be generated by feeding its box-level predictions from CPH and clip-level features from CMH into a small fully convolutional network. A clip-level segmentation loss is proposed to ensure that the generated instance masks are temporally coherent in the clip. The proposed CiCo strategy is free of inter-frame alignment, and can be easily embedded into existing FiFo based VIS approaches. To validate the generality and effectiveness of our CiCo strategy, we apply it to two representative FiFo methods, Yolact \cite{bolya2019yolact} and CondInst \cite{tian2020conditional}, resulting in two new one-stage VIS models, namely CiCo-Yolact and CiCo-CondInst, which achieve 37.1/37.3\%, 35.2/35.4\% and 17.2/18.0\% mask AP using the ResNet50 backbone, and 41.8/41.4\%, 38.0/38.9\% and 18.0/18.2\% mask AP using the Swin Transformer tiny backbone on YouTube-VIS 2019, 2021 and OVIS valid sets, respectively, recording new state-of-the-arts. Code and video demos of CiCo can be found at \url{https://github.com/MinghanLi/CiCo}.

* 20 pages

Via

Access Paper or Ask Questions

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Nov 18, 2021

Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar, Eric Gaussier

Figure 1 for The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Figure 2 for The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Figure 3 for The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Figure 4 for The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Abstract:On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, segmenting them into passages that can be treated by a standard BERT model, or modifying the self-attention mechanism to make it sparser as in sparse-attention models. However, these approaches either lose information or have high computational complexity (and are both time, memory and energy consuming in this later case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.

Via

Access Paper or Ask Questions

Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

Oct 04, 2021

Minghan Li, Jimmy Lin

Figure 1 for Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

Figure 2 for Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

Figure 3 for Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

Figure 4 for Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

Abstract:One key feature of dense passage retrievers (DPR) is the use of separate question and passage encoder in a bi-encoder design. Previous work on generalization of DPR mainly focus on testing both encoders in tandem on out-of-distribution (OOD) question-answering (QA) tasks, which is also known as domain adaptation. However, it is still unknown how DPR's individual question/passage encoder affects generalization. Specifically, in this paper, we want to know how an in-distribution (IND) question/passage encoder would generalize if paired with an OOD passage/question encoder from another domain. We refer to this challenge as \textit{encoder adaptation}. To answer this question, we inspect different combinations of DPR's question and passage encoder learned from five benchmark QA datasets on both in-domain and out-of-domain questions. We find that the passage encoder has more influence on the lower bound of generalization while the question encoder seems to affect the upper bound in general. For example, applying an OOD passage encoder usually hurts the retrieval accuracy while an OOD question encoder sometimes even improves the accuracy.

Via

Access Paper or Ask Questions