Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiashi Feng

The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation

Jul 26, 2020
Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, Jiashi Feng

Figure 1 for The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation

Figure 2 for The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation

Figure 3 for The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation

Figure 4 for The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation

Most existing object instance detection and segmentation models only work well on fairly balanced benchmarks where per-category training sample numbers are comparable, such as COCO. They tend to suffer performance drop on realistic datasets that are usually long-tailed. This work aims to study and address such open challenges. Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals. Based on such an observation, we first consider various techniques for improving long-tail classification performance which indeed enhance instance segmentation results. We then propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach. Without bells and whistles, it significantly boosts the performance of instance segmentation for tail classes on the recent LVIS dataset and our sampled COCO-LT dataset. Our analysis provides useful insights for solving long-tail instance detection and segmentation problems, and the straightforward \emph{SimCal} method can serve as a simple but strong baseline. With the method we have won the 2019 LVIS challenge. Codes and models are available at \url{https://github.com/twangnh/SimCal}.

* LVIS 2019 challenge winner, performance significantly improved after challenge submission, accepted at ECCV 2019

Via

Access Paper or Ask Questions

Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

Jul 12, 2020
Chenyang Si, Xuecheng Nie, Wei Wang, Liang Wang, Tieniu Tan, Jiashi Feng

Figure 1 for Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

Figure 2 for Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

Figure 3 for Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

Figure 4 for Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition

We consider the problem of semi-supervised 3D action recognition which has been rarely explored before. Its major challenge lies in how to effectively learn motion representations from unlabeled data. Self-supervised learning (SSL) has been proved very effective at learning representations from unlabeled data in the image domain. However, few effective self-supervised approaches exist for 3D action recognition, and directly applying SSL for semi-supervised learning suffers from misalignment of representations learned from SSL and supervised learning tasks. To address these issues, we present Adversarial Self-Supervised Learning (ASSL), a novel framework that tightly couples SSL and the semi-supervised scheme via neighbor relation exploration and adversarial learning. Specifically, we design an effective SSL scheme to improve the discrimination capability of learned representations for 3D action recognition, through exploring the data relations within a neighborhood. We further propose an adversarial regularization to align the feature distributions of labeled and unlabeled samples. To demonstrate effectiveness of the proposed ASSL in semi-supervised 3D action recognition, we conduct extensive experiments on NTU and N-UCLA datasets. The results confirm its advantageous performance over state-of-the-art semi-supervised methods in the few label regime for 3D action recognition.

* Accepted by ECCV2020

Via

Access Paper or Ask Questions

Combating Domain Shift with Self-Taught Labeling

Jul 08, 2020
Jian Liang, Dapeng Hu, Jiashi Feng

Figure 1 for Combating Domain Shift with Self-Taught Labeling

Figure 2 for Combating Domain Shift with Self-Taught Labeling

Figure 3 for Combating Domain Shift with Self-Taught Labeling

Figure 4 for Combating Domain Shift with Self-Taught Labeling

We present a novel method to combat domain shift when adapting classification models trained on one domain to other new domains with few or no target labels. In the existing literature, a prevailing solution paradigm is to learn domain-invariant feature representations so that a classifier learned on the source features generalizes well to the target features. However, such a classifier is inevitably biased to the source domain by overlooking the structure of the target data. Instead, we propose Self-Taught Labeling (SeTL), a new regularization approach that finds an auxiliary target-specific classifier for unlabeled data. During adaptation, this classifier is able to teach the target domain itself by providing \emph{unbiased accurate} pseudo labels. In particular, for each target data, we employ the memory bank to store the feature along with its soft label from the domain-shared classifier. Then we develop a non-parametric neighborhood aggregation strategy to generate new pseudo labels as well as confidence weights for unlabeled data. Though simply using the standard classification objective, SeTL significantly outperforms existing domain alignment techniques on a large variety of domain adaptation benchmarks. We expect that SeTL can provide a new perspective of addressing domain shift and inspire future research of domain adaptation and transfer learning.

Via

Access Paper or Ask Questions

Rethinking Bottleneck Structure for Efficient Mobile Network Design

Jul 07, 2020
Zhou Daquan, Qibin Hou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Figure 1 for Rethinking Bottleneck Structure for Efficient Mobile Network Design

Figure 2 for Rethinking Bottleneck Structure for Efficient Mobile Network Design

Figure 3 for Rethinking Bottleneck Structure for Efficient Mobile Network Design

Figure 4 for Rethinking Bottleneck Structure for Efficient Mobile Network Design

The inverted residual block is dominating architecture design for mobile networks recently. It changes the classic residual bottleneck by introducing two design rules: learning inverted residuals and using linear bottlenecks. In this paper, we rethink the necessity of such design changes and find it may bring risks of information loss and gradient confusion. We thus propose to flip the structure and present a novel bottleneck design, called the sandglass block, that performs identity mapping and spatial transformation at higher dimensions and thus alleviates information loss and gradient confusion effectively. Extensive experiments demonstrate that, different from the common belief, such bottleneck structure is more beneficial than the inverted ones for mobile networks. In ImageNet classification, by simply replacing the inverted residual block with our sandglass block without increasing parameters and computation, the classification accuracy can be improved by more than 1.7% over MobileNetV2. On Pascal VOC 2007 test set, we observe that there is also 0.9% mAP improvement in object detection. We further verify the effectiveness of the sandglass block by adding it into the search space of neural architecture search method DARTS. With 25% parameter reduction, the classification accuracy is improved by 0.13% over previous DARTS models. Code can be found at: https://github.com/zhoudaquan/rethinking_bottleneck_design.

* 24 pages, published as a ECCV20 conference paper

Via

Access Paper or Ask Questions

Local Grid Rendering Networks for 3D Object Detection in Point Clouds

Jul 04, 2020
Jianan Li, Jiashi Feng

Figure 1 for Local Grid Rendering Networks for 3D Object Detection in Point Clouds

Figure 2 for Local Grid Rendering Networks for 3D Object Detection in Point Clouds

Figure 3 for Local Grid Rendering Networks for 3D Object Detection in Point Clouds

Figure 4 for Local Grid Rendering Networks for 3D Object Detection in Point Clouds

The performance of 3D object detection models over point clouds highly depends on their capability of modeling local geometric patterns. Conventional point-based models exploit local patterns through a symmetric function (e.g. max pooling) or based on graphs, which easily leads to loss of fine-grained geometric structures. Regarding capturing spatial patterns, CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid. In this work, we aim to improve performance of point-based models by enhancing their pattern learning ability through leveraging CNNs while preserving computational efficiency. We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently, which allows small-size CNNs to accurately model local patterns and avoids convolutions over a dense grid to save computation cost. With the LGR operation, we introduce a new generic backbone called LGR-Net for point cloud feature extraction with simple design and high efficiency. We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets. It advances state-of-the-art results significantly by 5.5 and 4.5 mAP, respectively, with only slight increased computation overhead.

Via

Access Paper or Ask Questions

Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

Jul 04, 2020
Jianfeng Zhang, Xuecheng Nie, Jiashi Feng

Figure 1 for Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

Figure 2 for Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

Figure 3 for Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

Figure 4 for Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation

Existing 3D human pose estimation models suffer performance drop when applying to new scenarios with unseen poses due to their limited generalizability. In this work, we propose a novel framework, Inference Stage Optimization (ISO), for improving the generalizability of 3D pose models when source and target data come from different pose distributions. Our main insight is that the target data, even though not labeled, carry valuable priors about their underlying distribution. To exploit such information, the proposed ISO performs geometry-aware self-supervised learning (SSL) on each single target instance and updates the 3D pose model before making prediction. In this way, the model can mine distributional knowledge about the target scenario and quickly adapt to it with enhanced generalization performance. In addition, to handle sequential target data, we propose an online mode for implementing our ISO framework via streaming the SSL, which substantially enhances its effectiveness. We systematically analyze why and how our ISO framework works on diverse benchmarks under cross-scenario setup. Remarkably, it yields new state-of-the-art of 83.6% 3D PCK on MPI-INF-3DHP, improving upon the previous best result by 9.7%. Code will be released.

Via

Access Paper or Ask Questions

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Jun 18, 2020
Yu Li, Tao Wang, Bingyi Kang, Sheng Tang, Chunfeng Wang, Jintao Li, Jiashi Feng

Figure 1 for Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Figure 2 for Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Figure 3 for Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Figure 4 for Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Solving long-tail large vocabulary object detection with deep learning based models is a challenging and demanding task, which is however under-explored.In this work, we provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution. We find existing detection methods are unable to model few-shot classes when the dataset is extremely skewed, which can result in classifier imbalance in terms of parameter magnitude. Directly adapting long-tail classification models to detection frameworks can not solve this problem due to the intrinsic difference between detection and classification.In this work, we propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training. It implicitly modulates the training process for the head and tail classes and ensures they are both sufficiently trained, without requiring any extra sampling for the instances from the tail classes.Extensive experiments on the very recent long-tail large vocabulary object recognition benchmark LVIS show that our proposed BAGS significantly improves the performance of detectors with various backbones and frameworks on both object detection and instance segmentation. It beats all state-of-the-art methods transferred from long-tail image classification and establishes new state-of-the-art.Code is available at https://github.com/FishYuLi/BalancedGroupSoftmax.

* CVPR 2020 (Oral). Code is available at https://github.com/FishYuLi/BalancedGroupSoftmax

Via

Access Paper or Ask Questions

Multi-Miner: Object-Adaptive Region Mining for Weakly-Supervised Semantic Segmentation

Jun 14, 2020
Kuangqi Zhou, Qibin Hou, Zun Li, Jiashi Feng

Figure 1 for Multi-Miner: Object-Adaptive Region Mining for Weakly-Supervised Semantic Segmentation

Figure 2 for Multi-Miner: Object-Adaptive Region Mining for Weakly-Supervised Semantic Segmentation

Figure 3 for Multi-Miner: Object-Adaptive Region Mining for Weakly-Supervised Semantic Segmentation

Figure 4 for Multi-Miner: Object-Adaptive Region Mining for Weakly-Supervised Semantic Segmentation

Object region mining is a critical step for weakly-supervised semantic segmentation. Most recent methods mine the object regions by expanding the seed regions localized by class activation maps. They generally do not consider the sizes of objects and apply a monotonous procedure to mining all the object regions. Thus their mined regions are often insufficient in number and scale for large objects, and on the other hand easily contaminated by surrounding backgrounds for small objects. In this paper, we propose a novel multi-miner framework to perform a region mining process that adapts to diverse object sizes and is thus able to mine more integral and finer object regions. Specifically, our multi-miner leverages a parallel modulator to check whether there are remaining object regions for each single object, and guide a category-aware generator to mine the regions of each object independently. In this way, the multi-miner adaptively takes more steps for large objects and fewer steps for small objects. Experiment results demonstrate that the multi-miner offers better region mining results and helps achieve better segmentation performance than state-of-the-art weakly-supervised semantic segmentation methods.

Via

Access Paper or Ask Questions