Alert button
Picture for Kai Han

Kai Han

Alert button

Dynamic Feature Pyramid Networks for Object Detection

Dec 01, 2020
Mingjian Zhu, Kai Han, Changbin Yu, Yunhe Wang

Figure 1 for Dynamic Feature Pyramid Networks for Object Detection
Figure 2 for Dynamic Feature Pyramid Networks for Object Detection
Figure 3 for Dynamic Feature Pyramid Networks for Object Detection
Figure 4 for Dynamic Feature Pyramid Networks for Object Detection

This paper studies feature pyramid network (FPN), which is a widely used module for aggregating multi-scale feature information in the object detection system. The performance gain in most of the existing works is mainly contributed to the increase of computation burden, especially the floating number operations (FLOPs). In addition, the multi-scale information within each layer in FPN has not been well investigated. To this end, we first introduce an inception FPN in which each layer contains convolution filters with different kernel sizes to enlarge the receptive field and integrate more useful information. Moreover, we point out that not all objects need such a complicated calculation module and propose a new dynamic FPN (DyFPN). Each layer in the DyFPN consists of multiple branches with different computational costs. Specifically, the output features of DyFPN will be calculated by using the adaptively selected branch according to a learnable gating operation. Therefore, the proposed method can provide a more efficient dynamic inference for achieving a better trade-off between accuracy and detection performance. Extensive experiments conducted on benchmarks demonstrate that the proposed DyFPN significantly improves performance with the optimal allocation of computation resources. For instance, replacing the FPN with the inception FPN improves detection accuracy by 1.6 AP using the Faster R-CNN paradigm on COCO minival, and the DyFPN further reduces about 40% of its FLOPs while maintaining similar performance.

Viaarxiv icon

Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets

Oct 28, 2020
Kai Han, Yunhe Wang, Qiulin Zhang, Wei Zhang, Chunjing Xu, Tong Zhang

Figure 1 for Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
Figure 2 for Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
Figure 3 for Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
Figure 4 for Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets

To obtain excellent deep neural architectures, a series of techniques are carefully designed in EfficientNets. The giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks. So that we can find networks with high efficiency and excellent performance by twisting the three dimensions. This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs. Different from the network enlarging, we observe that resolution and depth are more important than width for tiny networks. Therefore, the original method, i.e., the compound scaling in EfficientNet is no longer suitable. To this end, we summarize a tiny formula for downsizing neural architectures through a series of smaller models derived from the EfficientNet-B0 with the FLOPs constraint. Experimental results on the ImageNet benchmark illustrate that our TinyNet performs much better than the smaller version of EfficientNets using the inversed giant formula. For instance, our TinyNet-E achieves a 59.9% Top-1 accuracy with only 24M FLOPs, which is about 1.9% higher than that of the previous best MobileNetV3 with similar computational cost. Code will be available at https://github.com/huawei-noah/CV-Backbones/tree/main/tinynet, and https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/tinynet.

* NeurIPS 2020 
Viaarxiv icon

Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time

Oct 22, 2020
Kai Han, Zongmai Cao, Shuang Cui, Benwei Wu

Figure 1 for Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
Figure 2 for Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
Figure 3 for Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
Figure 4 for Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time

We study the problem of maximizing a non-monotone, non-negative submodular function subject to a matroid constraint. The prior best-known deterministic approximation ratio for this problem is $\frac{1}{4}-\epsilon$ under $\mathcal{O}(({n^4}/{\epsilon})\log n)$ time complexity. We show that this deterministic ratio can be improved to $\frac{1}{4}$ under $\mathcal{O}(nr)$ time complexity, and then present a more practical algorithm dubbed TwinGreedyFast which achieves $\frac{1}{4}-\epsilon$ deterministic ratio in nearly-linear running time of $\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon})$. Our approach is based on a novel algorithmic framework of simultaneously constructing two candidate solution sets through greedy search, which enables us to get improved performance bounds by fully exploiting the properties of independence systems. As a byproduct of this framework, we also show that TwinGreedyFast achieves $\frac{1}{2p+2}-\epsilon$ deterministic ratio under a $p$-set system constraint with the same time complexity. To showcase the practicality of our approach, we empirically evaluated the performance of TwinGreedyFast on two network applications, and observed that it outperforms the state-of-the-art deterministic and randomized algorithms with efficient implementations for our problem.

* Accepted to appear in the Thirty-Fourth Conference on Neural Information Processing Systems (NeurIPS 2020) 
Viaarxiv icon

Training Binary Neural Networks through Learning with Noisy Supervision

Oct 10, 2020
Kai Han, Yunhe Wang, Yixing Xu, Chunjing Xu, Enhua Wu, Chang Xu

Figure 1 for Training Binary Neural Networks through Learning with Noisy Supervision
Figure 2 for Training Binary Neural Networks through Learning with Noisy Supervision
Figure 3 for Training Binary Neural Networks through Learning with Noisy Supervision
Figure 4 for Training Binary Neural Networks through Learning with Noisy Supervision

This paper formalizes the binarization operations over neural networks from a learning perspective. In contrast to classical hand crafted rules (\eg hard thresholding) to binarize full-precision neurons, we propose to learn a mapping from full-precision neurons to the target binary ones. Each individual weight entry will not be binarized independently. Instead, they are taken as a whole to accomplish the binarization, just as they work together in generating convolution features. To help the training of the binarization mapping, the full-precision neurons after taking sign operations is regarded as some auxiliary supervision signal, which is noisy but still has valuable guidance. An unbiased estimator is therefore introduced to mitigate the influence of the supervision noise. Experimental results on benchmark datasets indicate that the proposed binarization technique attains consistent improvements over baselines.

* ICML 2020 
Viaarxiv icon

Searching for Low-Bit Weights in Quantized Neural Networks

Sep 18, 2020
Zhaohui Yang, Yunhe Wang, Kai Han, Chunjing Xu, Chao Xu, Dacheng Tao, Chang Xu

Figure 1 for Searching for Low-Bit Weights in Quantized Neural Networks
Figure 2 for Searching for Low-Bit Weights in Quantized Neural Networks

Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters (i.e., 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately. In particular, each weight is represented as a probability distribution over the discrete value set. The probabilities are optimized during training and the values with the highest probability are selected to establish the desired quantized network. Experimental results on benchmarks demonstrate that the proposed method is able to produce quantized neural networks with higher performance over the state-of-the-art methods on both image classification and super-resolution tasks.

Viaarxiv icon

Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint

Aug 12, 2020
Jing Tang, Xueyan Tang, Andrew Lim, Kai Han, Chongshou Li, Junsong Yuan

Figure 1 for Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint
Figure 2 for Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint
Figure 3 for Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint

Monotone submodular maximization with a knapsack constraint is NP-hard. Various approximation algorithms have been devised to address this optimization problem. In this paper, we revisit the widely known modified greedy algorithm. First, we show that this algorithm can achieve an approximation factor of $0.405$, which significantly improves the known factor of $0.357$ given by Wolsey or $(1-1/\mathrm{e})/2\approx 0.316$ given by Khuller et al. More importantly, our analysis uncovers a gap in Khuller et al.'s proof for the extensively mentioned approximation factor of $(1-1/\sqrt{\mathrm{e}})\approx 0.393$ in the literature to clarify a long time of misunderstanding on this issue. Second, we enhance the modified greedy algorithm to derive a data-dependent upper bound on the optimum. We empirically demonstrate the tightness of our upper bound with a real-world application. The bound enables us to obtain a data-dependent ratio typically much higher than $0.405$ between the solution value of the modified greedy algorithm and the optimum. It can also be used to significantly improve the efficiency of algorithms such as branch and bound.

Viaarxiv icon

Deep Photometric Stereo for Non-Lambertian Surfaces

Jul 26, 2020
Guanying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong

Figure 1 for Deep Photometric Stereo for Non-Lambertian Surfaces
Figure 2 for Deep Photometric Stereo for Non-Lambertian Surfaces
Figure 3 for Deep Photometric Stereo for Non-Lambertian Surfaces
Figure 4 for Deep Photometric Stereo for Non-Lambertian Surfaces

This paper addresses the problem of photometric stereo, in both calibrated and uncalibrated scenarios, for non-Lambertian surfaces based on deep learning. We first introduce a fully convolutional deep network for calibrated photometric stereo, which we call PS-FCN. Unlike traditional approaches that adopt simplified reflectance models to make the problem tractable, our method directly learns the mapping from reflectance observations to surface normal, and is able to handle surfaces with general and unknown isotropic reflectance. At test time, PS-FCN takes an arbitrary number of images and their associated light directions as input and predicts a surface normal map of the scene in a fast feed-forward pass. To deal with the uncalibrated scenario where light directions are unknown, we introduce a new convolutional network, named LCNet, to estimate light directions from input images. The estimated light directions and the input images are then fed to PS-FCN to determine the surface normals. Our method does not require a pre-defined set of light directions and can handle multiple images in an order-agnostic manner. Thorough evaluation of our approach on both synthetic and real datasets shows that it outperforms state-of-the-art methods in both calibrated and uncalibrated scenarios.

Viaarxiv icon

LSD-C: Linearly Separable Deep Clusters

Jun 17, 2020
Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Kai Han, Andrea Vedaldi, Andrew Zisserman

Figure 1 for LSD-C: Linearly Separable Deep Clusters
Figure 2 for LSD-C: Linearly Separable Deep Clusters
Figure 3 for LSD-C: Linearly Separable Deep Clusters
Figure 4 for LSD-C: Linearly Separable Deep Clusters

We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our algorithm first establishes pairwise connections in the feature space between the samples of the minibatch based on a similarity metric. Then it regroups in clusters the connected samples and enforces a linear separation between clusters. This is achieved by using the pairwise connections as targets together with a binary cross-entropy loss on the predictions that the associated pairs of samples belong to the same cluster. This way, the feature representation of the network will evolve such that similar samples in this feature space will belong to the same linearly separated cluster. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.

* Code available at https://github.com/srebuffi/lsd-clusters 
Viaarxiv icon

Dual-Resolution Correspondence Networks

Jun 16, 2020
Xinghui Li, Kai Han, Shuda Li, Victor Adrian Prisacariu

Figure 1 for Dual-Resolution Correspondence Networks
Figure 2 for Dual-Resolution Correspondence Networks
Figure 3 for Dual-Resolution Correspondence Networks
Figure 4 for Dual-Resolution Correspondence Networks

We tackle the problem of establishing dense pixel-wise correspondences between a pair of images. In this work, we introduce Dual-Resolution Correspondence Networks (DRC-Net), to obtain pixel-wise correspondences in a coarse-to-fine manner. DRC-Net extracts both coarse- and fine- resolution feature maps. The coarse maps are used to produce a full but coarse 4D correlation tensor, which is then refined by a learnable neighbourhood consensus module. The fine-resolution feature maps are used to obtain the final dense correspondences guided by the refined coarse 4D correlation tensor. The selected coarse-resolution matching scores allow the fine-resolution features to focus only on a limited number of possible matches with high confidence. In this way, DRC-Net dramatically increases matching reliability and localisation accuracy, while avoiding to apply the expensive 4D convolution kernels on fine-resolution feature maps. We comprehensively evaluate our method on large-scale public benchmarks including HPatches, InLoc, and Aachen Day-Night. It achieves the state-of-the-art results on all of them.

Viaarxiv icon

Anisotropic Convolutional Networks for 3D Semantic Scene Completion

Apr 05, 2020
Jie Li, Kai Han, Peng Wang, Yu Liu, Xia Yuan

Figure 1 for Anisotropic Convolutional Networks for 3D Semantic Scene Completion
Figure 2 for Anisotropic Convolutional Networks for 3D Semantic Scene Completion
Figure 3 for Anisotropic Convolutional Networks for 3D Semantic Scene Completion
Figure 4 for Anisotropic Convolutional Networks for 3D Semantic Scene Completion

As a voxel-wise labeling task, semantic scene completion (SSC) tries to simultaneously infer the occupancy and semantic labels for a scene from a single depth and/or RGB image. The key challenge for SSC is how to effectively take advantage of the 3D context to model various objects or stuffs with severe variations in shapes, layouts and visibility. To handle such variations, we propose a novel module called anisotropic convolution, which properties with flexibility and power impossible for the competing methods such as standard 3D convolution and some of its variations. In contrast to the standard 3D convolution that is limited to a fixed 3D receptive field, our module is capable of modeling the dimensional anisotropy voxel-wisely. The basic idea is to enable anisotropic 3D receptive field by decomposing a 3D convolution into three consecutive 1D convolutions, and the kernel size for each such 1D convolution is adaptively determined on the fly. By stacking multiple such anisotropic convolution modules, the voxel-wise modeling capability can be further enhanced while maintaining a controllable amount of model parameters. Extensive experiments on two SSC benchmarks, NYU-Depth-v2 and NYUCAD, show the superior performance of the proposed method. Our code is available at https://waterljwant.github.io/SSC/

Viaarxiv icon