Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qi Chen

Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

May 08, 2022
Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krishnaswamy, Gopal Srinivasa, Suhas Jayaram Subramanya, Jingdong Wang

Figure 1 for Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

Figure 2 for Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

Figure 3 for Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

Figure 4 for Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

Despite the broad range of algorithms for Approximate Nearest Neighbor Search, most empirical evaluations of algorithms have focused on smaller datasets, typically of 1 million points~\citep{Benchmark}. However, deploying recent advances in embedding based techniques for search, recommendation and ranking at scale require ANNS indices at billion, trillion or larger scale. Barring a few recent papers, there is limited consensus on which algorithms are effective at this scale vis-\`a-vis their hardware cost. This competition compares ANNS algorithms at billion-scale by hardware cost, accuracy and performance. We set up an open source evaluation framework and leaderboards for both standardized and specialized hardware. The competition involves three tracks. The standard hardware track T1 evaluates algorithms on an Azure VM with limited DRAM, often the bottleneck in serving billion-scale indices, where the embedding data can be hundreds of GigaBytes in size. It uses FAISS~\citep{Faiss17} as the baseline. The standard hardware track T2 additional allows inexpensive SSDs in addition to the limited DRAM and uses DiskANN~\citep{DiskANN19} as the baseline. The specialized hardware track T3 allows any hardware configuration, and again uses FAISS as the baseline. We compiled six diverse billion-scale datasets, four newly released for this competition, that span a variety of modalities, data types, dimensions, deep learning models, distance functions and sources. The outcome of the competition was ranked leaderboards of algorithms in each track based on recall at a query throughput threshold. Additionally, for track T3, separate leaderboards were created based on recall as well as cost-normalized and power-normalized query throughput.

Via

Access Paper or Ask Questions

Proposal-free Lidar Panoptic Segmentation with Pillar-level Affinity

Apr 19, 2022
Qi Chen, Sourabh Vora

Figure 1 for Proposal-free Lidar Panoptic Segmentation with Pillar-level Affinity

Figure 2 for Proposal-free Lidar Panoptic Segmentation with Pillar-level Affinity

Figure 3 for Proposal-free Lidar Panoptic Segmentation with Pillar-level Affinity

Figure 4 for Proposal-free Lidar Panoptic Segmentation with Pillar-level Affinity

We propose a simple yet effective proposal-free architecture for lidar panoptic segmentation. We jointly optimize both semantic segmentation and class-agnostic instance classification in a single network using a pillar-based bird's-eye view representation. The instance classification head learns pairwise affinity between pillars to determine whether the pillars belong to the same instance or not. We further propose a local clustering algorithm to propagate instance ids by merging semantic segmentation and affinity predictions. Our experiments on nuScenes dataset show that our approach outperforms previous proposal-free methods and is comparable to proposal-based methods which requires extra annotation from object detection.

* CVPRW 2022 Workshop on Autonomous Driving

Via

Access Paper or Ask Questions

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Apr 01, 2022
Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, Denvy Deng, Qi Zhang, Xing Xie

Figure 1 for Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Figure 2 for Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Figure 3 for Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Figure 4 for Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Vector quantization (VQ) based ANN indexes, such as Inverted File System (IVF) and Product Quantization (PQ), have been widely applied to embedding based document retrieval thanks to the competitive time and memory efficiency. Originally, VQ is learned to minimize the reconstruction loss, i.e., the distortions between the original dense embeddings and the reconstructed embeddings after quantization. Unfortunately, such an objective is inconsistent with the goal of selecting ground-truth documents for the input query, which may cause severe loss of retrieval quality. Recent works identify such a defect, and propose to minimize the retrieval loss through contrastive learning. However, these methods intensively rely on queries with ground-truth documents, whose performance is limited by the insufficiency of labeled data. In this paper, we propose Distill-VQ, which unifies the learning of IVF and PQ within a knowledge distillation framework. In Distill-VQ, the dense embeddings are leveraged as "teachers", which predict the query's relevance to the sampled documents. The VQ modules are treated as the "students", which are learned to reproduce the predicted relevance, such that the reconstructed embeddings may fully preserve the retrieval result of the dense embeddings. By doing so, Distill-VQ is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality. We perform comprehensive explorations for the optimal conduct of knowledge distillation, which may provide useful insights for the learning of VQ based ANN index. We also experimentally show that the labeled data is no longer a necessity for high-quality vector quantization, which indicates Distill-VQ's strong applicability in practice.

* Accepted by SIGIR 2022

Via

Access Paper or Ask Questions

Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation

Mar 06, 2022
Qi Chen, Lingxiao Yang, Jianhuang Lai, Xiaohua Xie

Figure 1 for Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation

Figure 2 for Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation

Figure 3 for Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation

Figure 4 for Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation

Weakly Supervised Semantic Segmentation (WSSS) based on image-level labels has attracted much attention due to low annotation costs. Existing methods often rely on Class Activation Mapping (CAM) that measures the correlation between image pixels and classifier weight. However, the classifier focuses only on the discriminative regions while ignoring other useful information in each image, resulting in incomplete localization maps. To address this issue, we propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss. Specifically, IPE tailors prototypes for every image to capture complete regions, formed our Image-Specific CAM (IS-CAM), which is realized by two sequential steps. In addition, GSC is proposed to construct the consistency of general CAM and our specific IS-CAM, which further optimizes the feature representation and empowers a self-correction ability of prototype exploration. Extensive experiments are conducted on PASCAL VOC 2012 and MS COCO 2014 segmentation benchmark and results show our SIPE achieves new state-of-the-art performance using only image-level labels. The code is available at https://github.com/chenqi1126/SIPE.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

V2C: Visual Voice Cloning

Nov 25, 2021
Qi Chen, Yuanqing Li, Yuankai Qi, Jiaqiu Zhou, Mingkui Tan, Qi Wu

Existing Voice Cloning (VC) tasks aim to convert a paragraph text to a speech with desired voice specified by a reference audio. This has significantly boosted the development of artificial speech applications. However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plots. To fill this gap, in this work we propose a new task named Visual Voice Cloning (V2C), which seeks to convert a paragraph of text to a speech with both desired voice specified by a reference audio and desired emotion specified by a reference video. To facilitate research in this field, we construct a dataset, V2C-Animation, and propose a strong baseline based on existing state-of-the-art (SoTA) VC techniques. Our dataset contains 10,217 animated movie clips covering a large variety of genres (e.g., Comedy, Fantasy) and emotions (e.g., happy, sad). We further design a set of evaluation metrics, named MCD-DTW-SL, which help evaluate the similarity between ground-truth speeches and the synthesised ones. Extensive experimental results show that even SoTA VC methods cannot generate satisfying speeches for our V2C task. We hope the proposed new task together with the constructed dataset and evaluation metric will facilitate the research in the field of voice cloning and the broader vision-and-language community.

* 15 pages, 14 figures

Via

Access Paper or Ask Questions

TraSw: Tracklet-Switch Adversarial Attacks against Multi-Object Tracking

Nov 17, 2021
Delv Lin, Qi Chen, Chengyu Zhou, Kun He

Figure 1 for TraSw: Tracklet-Switch Adversarial Attacks against Multi-Object Tracking

Figure 2 for TraSw: Tracklet-Switch Adversarial Attacks against Multi-Object Tracking

Figure 3 for TraSw: Tracklet-Switch Adversarial Attacks against Multi-Object Tracking

Figure 4 for TraSw: Tracklet-Switch Adversarial Attacks against Multi-Object Tracking

Benefiting from the development of Deep Neural Networks, Multi-Object Tracking (MOT) has achieved aggressive progress. Currently, the real-time Joint-Detection-Tracking (JDT) based MOT trackers gain increasing attention and derive many excellent models. However, the robustness of JDT trackers is rarely studied, and it is challenging to attack the MOT system since its mature association algorithms are designed to be robust against errors during tracking. In this work, we analyze the weakness of JDT trackers and propose a novel adversarial attack method, called Tracklet-Switch (TraSw), against the complete tracking pipeline of MOT. Specifically, a push-pull loss and a center leaping optimization are designed to generate adversarial examples for both re-ID feature and object detection. TraSw can fool the tracker to fail to track the targets in the subsequent frames by attacking very few frames. We evaluate our method on the advanced deep trackers (i.e., FairMOT, JDE, ByteTrack) using the MOT-Challenge datasets (i.e., 2DMOT15, MOT17, and MOT20). Experiments show that TraSw can achieve a high success rate of over 95% by attacking only five frames on average for the single-target attack and a reasonably high success rate of over 80% for the multiple-target attack. The code is available at https://github.com/DerryHub/FairMOT-attack .

Via

Access Paper or Ask Questions

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

Nov 05, 2021
Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, Jingdong Wang

Figure 1 for SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

Figure 2 for SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

Figure 3 for SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

Figure 4 for SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

The in-memory algorithms for approximate nearest neighbor search (ANNS) have achieved great success for fast high-recall search, but are extremely expensive when handling very large scale database. Thus, there is an increasing request for the hybrid ANNS solutions with small memory and inexpensive solid-state drive (SSD). In this paper, we present a simple but efficient memory-disk hybrid indexing and search system, named SPANN, that follows the inverted index methodology. It stores the centroid points of the posting lists in the memory and the large posting lists in the disk. We guarantee both disk-access efficiency (low latency) and high recall by effectively reducing the disk-access number and retrieving high-quality posting lists. In the index-building stage, we adopt a hierarchical balanced clustering algorithm to balance the length of posting lists and augment the posting list by adding the points in the closure of the corresponding clusters. In the search stage, we use a query-aware scheme to dynamically prune the access of unnecessary posting lists. Experiment results demonstrate that SPANN is 2$\times$ faster than the state-of-the-art ANNS solution DiskANN to reach the same recall quality $90\%$ with same memory cost in three billion-scale datasets. It can reach $90\%$ recall@1 and recall@10 in just around one millisecond with only 32GB memory cost. Code is available at: {\footnotesize\color{blue}{\url{https://github.com/microsoft/SPTAG}}}.

* Accepted to NeurIPS 2021

Via

Access Paper or Ask Questions

Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

Sep 29, 2021
Qi Chen, Changjian Shui, Mario Marchand

Figure 1 for Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

Figure 2 for Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

Figure 3 for Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

Figure 4 for Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

We derive a novel information-theoretic analysis of the generalization property of meta-learning algorithms. Concretely, our analysis proposes a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning (MAML) algorithms. Moreover, we provide a data-dependent generalization bound for a stochastic variant of MAML, which is non-vacuous for deep few-shot learning. As compared to previous bounds that depend on the square norm of gradients, empirical validations on both simulated data and a well-known few-shot benchmark show that our bound is orders of magnitude tighter in most situations.

Via

Access Paper or Ask Questions

PolarStream: Streaming Lidar Object Detection and Segmentation with Polar Pillars

Jun 14, 2021
Qi Chen, Sourabh Vora, Oscar Beijbom

Figure 1 for PolarStream: Streaming Lidar Object Detection and Segmentation with Polar Pillars

Figure 2 for PolarStream: Streaming Lidar Object Detection and Segmentation with Polar Pillars

Figure 3 for PolarStream: Streaming Lidar Object Detection and Segmentation with Polar Pillars

Figure 4 for PolarStream: Streaming Lidar Object Detection and Segmentation with Polar Pillars

Recent works recognized lidars as an inherently streaming data source and showed that the end-to-end latency of lidar perception models can be reduced significantly by operating on wedge-shaped point cloud sectors rather then the full point cloud. However, due to use of cartesian coordinate systems these methods represent the sectors as rectangular regions, wasting memory and compute. In this work we propose using a polar coordinate system and make two key improvements on this design. First, we increase the spatial context by using multi-scale padding from neighboring sectors: preceding sector from the current scan and/or the following sector from the past scan. Second, we improve the core polar convolutional architecture by introducing feature undistortion and range stratified convolutions. Experimental results on the nuScenes dataset show significant improvements over other streaming based methods. We also achieve comparable results to existing non-streaming methods but with lower latencies.

Via

Access Paper or Ask Questions

Contrastive Neural Architecture Search with Neural Architecture Comparators

Apr 06, 2021
Yaofo Chen, Yong Guo, Qi Chen, Minli Li, Wei Zeng, Yaowei Wang, Mingkui Tan

Figure 1 for Contrastive Neural Architecture Search with Neural Architecture Comparators

Figure 2 for Contrastive Neural Architecture Search with Neural Architecture Comparators

Figure 3 for Contrastive Neural Architecture Search with Neural Architecture Comparators

Figure 4 for Contrastive Neural Architecture Search with Neural Architecture Comparators

One of the key steps in Neural Architecture Search (NAS) is to estimate the performance of candidate architectures. Existing methods either directly use the validation performance or learn a predictor to estimate the performance. However, these methods can be either computationally expensive or very inaccurate, which may severely affect the search efficiency and performance. Moreover, as it is very difficult to annotate architectures with accurate performance on specific tasks, learning a promising performance predictor is often non-trivial due to the lack of labeled data. In this paper, we argue that it may not be necessary to estimate the absolute performance for NAS. On the contrary, we may need only to understand whether an architecture is better than a baseline one. However, how to exploit this comparison information as the reward and how to well use the limited labeled data remains two great challenges. In this paper, we propose a novel Contrastive Neural Architecture Search (CTNAS) method which performs architecture search by taking the comparison results between architectures as the reward. Specifically, we design and learn a Neural Architecture Comparator (NAC) to compute the probability of candidate architectures being better than a baseline one. Moreover, we present a baseline updating scheme to improve the baseline iteratively in a curriculum learning manner. More critically, we theoretically show that learning NAC is equivalent to optimizing the ranking over architectures. Extensive experiments in three search spaces demonstrate the superiority of our CTNAS over existing methods.

* Accpeted by CVPR 2021. The code is available at https://github.com/chenyaofo/CTNAS

Via

Access Paper or Ask Questions