Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ling Shao

Terminus Group, Beijing, China

3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds

Apr 10, 2020

Jiale Li, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S. Krylov, Yong Ding, Ling Shao

Figure 1 for 3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds

Figure 2 for 3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds

Figure 3 for 3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds

Figure 4 for 3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds

Abstract:Most existing point cloud based 3D object detectors focus on the tasks of classification and box regression. However, another bottleneck in this area is achieving an accurate detection confidence for the Non-Maximum Suppression (NMS) post-processing. In this paper, we add a 3D IoU prediction branch to the regular classification and regression branches. The predicted IoU is used as the detection confidence for NMS. In order to obtain a more accurate IoU prediction, we propose a 3D IoU-Net with IoU sensitive feature learning and an IoU alignment operation. To obtain a perspective-invariant prediction head, we propose an Attentive Corner Aggregation (ACA) module by aggregating a local point cloud feature from each perspective of eight corners and adaptively weighting the contribution of each perspective with different attentions. We propose a Corner Geometry Encoding (CGE) module for geometry information embedding. To the best of our knowledge, this is the first time geometric embedding information has been introduced in proposal feature learning. These two feature parts are then adaptively fused by a multi-layer perceptron (MLP) network as our IoU sensitive feature. The IoU alignment operation is introduced to resolve the mismatching between the bounding box regression head and IoU prediction, thereby further enhancing the accuracy of IoU prediction. The experimental results on the KITTI car detection benchmark show that 3D IoU-Net with IoU perception achieves state-of-the-art performance.

* 11 pages, 9 figures

Via

Access Paper or Ask Questions

FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points

Apr 04, 2020

Ahmed H. Shahin, Prateek Munjal, Ling Shao, Shadab Khan

Figure 1 for FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points

Figure 2 for FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points

Figure 3 for FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points

Figure 4 for FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points

Abstract:Semantic segmentation from user inputs has been actively studied to facilitate interactive segmentation for data annotation and other applications. Recent studies have shown that extreme points can be effectively used to encode user inputs. A heat map generated from the extreme points can be appended to the RGB image and input to the model for training. In this study, we present FAIRS -- a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks. We propose a novel approach for effectively encoding the user input from extreme points and corrective clicks, in a novel and scalable manner that allows the network to work with a variable number of clicks, including corrective clicks for output refinement. We also integrate a dual attention module with our approach to increase the efficacy of the model in preferentially attending to the objects. We demonstrate that these additions help achieve significant improvements over state-of-the-art in dense object segmentation from user inputs, on multiple large-scale datasets. Through experiments, we demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.

Via

Access Paper or Ask Questions

Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

Apr 03, 2020

Dwarikanath Mahapatra, Behzad Bozorgtabar, Jean-Philippe Thiran, Ling Shao

Figure 1 for Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

Figure 2 for Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

Figure 3 for Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

Figure 4 for Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation

Abstract:Medical image segmentation is an important task for computer aided diagnosis. Pixelwise manual annotations of large datasets require high expertise and is time consuming. Conventional data augmentations have limited benefit by not fully representing the underlying distribution of the training set, thus affecting model robustness when tested on images captured from different sources. Prior work leverages synthetic images for data augmentation ignoring the interleaved geometric relationship between different anatomical labels. We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. Latent space variable sampling results in diverse generated images from a base image and improves robustness. Given those augmented images generated by our method, we train the segmentation network to enhance the segmentation performance of retinal optical coherence tomography (OCT) images. The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures. Ablation studies and visual analysis also demonstrate benefits of integrating geometry and diversity.

* Accepted to CVPR 2020

Via

Access Paper or Ask Questions

Controllable Orthogonalization in Training DNNs

Apr 02, 2020

Lei Huang, Li Liu, Fan Zhu, Diwen Wan, Zehuan Yuan, Bo Li, Ling Shao

Figure 1 for Controllable Orthogonalization in Training DNNs

Figure 2 for Controllable Orthogonalization in Training DNNs

Figure 3 for Controllable Orthogonalization in Training DNNs

Figure 4 for Controllable Orthogonalization in Training DNNs

Abstract:Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1 and reduce redundancy in representation. This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI), to learn a layer-wise orthogonal weight matrix in DNNs. ONI works by iteratively stretching the singular values of a weight matrix towards 1. This property enables it to control the orthogonality of a weight matrix by its number of iterations. We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction. We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (SN), and further outperforms SN by providing controllable orthogonality.

* Accepted to CVPR 2020. The Code is available at https://github.com/huangleiBuaa/ONI

Via

Access Paper or Ask Questions

Architecture Disentanglement for Deep Neural Networks

Mar 30, 2020

Jie Hu, Rongrong Ji, Qixiang Ye, Tong Tong, ShengChuan Zhang, Ke Li, Feiyue Huang, Ling Shao

Figure 1 for Architecture Disentanglement for Deep Neural Networks

Figure 2 for Architecture Disentanglement for Deep Neural Networks

Figure 3 for Architecture Disentanglement for Deep Neural Networks

Figure 4 for Architecture Disentanglement for Deep Neural Networks

Abstract:Deep Neural Networks (DNNs) are central to deep learning, and understanding their internal working mechanism is crucial if they are to be used for emerging applications in medical and industrial AI. To this end, the current line of research typically involves linking semantic concepts to a DNN's units or layers. However, this fails to capture the hierarchical inference procedure throughout the network. To address this issue, we introduce the novel concept of Neural Architecture Disentanglement (NAD) in this paper. Specifically, we disentangle a pre-trained network into hierarchical paths corresponding to specific concepts, forming the concept feature paths, i.e., the concept flows from the bottom to top layers of a DNN. Such paths further enable us to quantify the interpretability of DNNs according to the learned diversity of human concepts. We select four types of representative architectures ranging from handcrafted to autoML-based, and conduct extensive experiments on object-based and scene-based datasets. Our NAD sheds important light on the information flow of semantic concepts in DNNs, and provides a fundamental metric that will facilitate the design of interpretable network architectures. Code will be available at: https://github.com/hujiecpp/NAD.

Via

Access Paper or Ask Questions

An Investigation into the Stochasticity of Batch Whitening

Mar 27, 2020

Lei Huang, Lei Zhao, Yi Zhou, Fan Zhu, Li Liu, Ling Shao

Figure 1 for An Investigation into the Stochasticity of Batch Whitening

Figure 2 for An Investigation into the Stochasticity of Batch Whitening

Figure 3 for An Investigation into the Stochasticity of Batch Whitening

Figure 4 for An Investigation into the Stochasticity of Batch Whitening

Abstract:Batch Normalization (BN) is extensively employed in various network architectures by performing standardization within mini-batches. A full understanding of the process has been a central target in the deep learning communities. Unlike existing works, which usually only analyze the standardization operation, this paper investigates the more general Batch Whitening (BW). Our work originates from the observation that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adversarial Networks (GANs). We attribute this phenomenon to the stochasticity that BW introduces. We quantitatively investigate the stochasticity of different whitening transformations and show that it correlates well with the optimization behaviors during training. We also investigate how stochasticity relates to the estimation of population statistics during inference. Based on our analysis, we provide a framework for designing and comparing BW algorithms in different scenarios. Our proposed BW algorithm improves the residual networks by a significant margin on ImageNet classification. Besides, we show that the stochasticity of BW can improve the GAN's performance with, however, the sacrifice of the training stability.

* Accepted to CVPR 2020. The Code is available at https://github.com/huangleiBuaa/StochasticityBW

Via

Access Paper or Ask Questions

Pedestrian Detection: The Elephant In The Room

Mar 22, 2020

Irtiza Hasan, Shengcai Liao, Jinpeng Li, Saad Ullah Akram, Ling Shao

Figure 1 for Pedestrian Detection: The Elephant In The Room

Figure 2 for Pedestrian Detection: The Elephant In The Room

Figure 3 for Pedestrian Detection: The Elephant In The Room

Figure 4 for Pedestrian Detection: The Elephant In The Room

Abstract:Pedestrian detection is used in many vision based applications ranging from video surveillance to autonomous driving. Despite achieving high performance, it is still largely unknown how well existing detectors generalize to unseen data. To this end, we conduct a comprehensive study in this paper, using a general principle of direct cross-dataset evaluation. Through this study, we find that existing state-of-the-art pedestrian detectors generalize poorly from one dataset to another. We demonstrate that there are two reasons for this trend. Firstly, they over-fit on popular datasets in a traditional single-dataset training and test pipeline. Secondly, the training source is generally not dense in pedestrians and diverse in scenarios. Accordingly, through experiments we find that a general purpose object detector works better in direct cross-dataset evaluation compared with state-of-the-art pedestrian detectors and we illustrate that diverse and dense datasets, collected by crawling the web, serve to be an efficient source of pre-training for pedestrian detection. Furthermore, we find that a progressive training pipeline works good for autonomous driving oriented detector. We improve upon previous state-of-the-art on reasonable/heavy subsets of CityPersons dataset by 1.3%/1.7% and on Caltech by 1.8%/14.9% in terms of log average miss rate (MR^2) points without any fine-tuning on the test set. Detector trained through proposed pipeline achieves top rank at the leaderborads of CityPersons [42] and ECP [4]. Code and models will be available at https://github.com/hasanirtiza/Pedestron.

* 17 pages, 1 figure

Via

Access Paper or Ask Questions

Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

Mar 17, 2020

Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao

Figure 1 for Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

Figure 2 for Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

Figure 3 for Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

Figure 4 for Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

Abstract:Zero-shot learning strives to classify unseen categories for which no data is available during training. In the generalized variant, the test samples can further belong to seen or unseen categories. The state-of-the-art relies on Generative Adversarial Networks that synthesize unseen class features by leveraging class-specific semantic embeddings. During training, they generate semantically consistent features, but discard this constraint during feature synthesis and classification. We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification. We further introduce a feedback loop, from a semantic embedding decoder, that iteratively refines the generated features during both the training and feature synthesis stages. The synthesized features together with their corresponding latent embeddings from the decoder are transformed into discriminative features and utilized during classification to reduce ambiguities among categories. Experiments on (generalized) zero-shot learning for object and action classification reveal the benefit of semantic consistency and iterative feedback for GAN-based networks, outperforming existing methods on six zero-shot learning benchmarks.

Via

Access Paper or Ask Questions

CycleISP: Real Image Restoration via Improved Data Synthesis

Mar 17, 2020

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

Figure 1 for CycleISP: Real Image Restoration via Improved Data Synthesis

Figure 2 for CycleISP: Real Image Restoration via Improved Data Synthesis

Figure 3 for CycleISP: Real Image Restoration via Improved Data Synthesis

Figure 4 for CycleISP: Real Image Restoration via Improved Data Synthesis

Abstract:The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumption of additive white Gaussian noise (AWGN). While the CNNs achieve impressive results on these synthetic datasets, they do not perform well when applied on real camera images, as reported in recent benchmark datasets. This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline. In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions. It allows us to produce any number of realistic image pairs for denoising both in RAW and sRGB spaces. By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets. The parameters in our model are ~5 times lesser than the previous best method for RAW denoising. Furthermore, we demonstrate that the proposed framework generalizes beyond image denoising problem e.g., for color matching in stereoscopic cinema. The source code and pre-trained models are available at https://github.com/swz30/CycleISP.

* CVPR 2020 (Oral)

Via

Access Paper or Ask Questions

Incremental Object Detection via Meta-Learning

Mar 17, 2020

K J Joseph, Jathushan Rajasegaran, Salman Khan, Fahad Shahbaz Khan, Vineeth Balasubramanian, Ling Shao

Figure 1 for Incremental Object Detection via Meta-Learning

Figure 2 for Incremental Object Detection via Meta-Learning

Figure 3 for Incremental Object Detection via Meta-Learning

Figure 4 for Incremental Object Detection via Meta-Learning

Abstract:In a real-world setting, object instances from new classes may be continuously encountered by object detectors. When existing object detectors are applied to such scenarios, their performance on old classes deteriorates significantly. A few efforts have been reported to address this limitation, all of which apply variants of knowledge distillation to avoid catastrophic forgetting. We note that although distillation helps to retain previous learning, it obstructs fast adaptability to new tasks, which is a critical requirement for incremental learning. In this pursuit, we propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared. This ensures a seamless information transfer via a meta-learned gradient preconditioning that minimizes forgetting and maximizes knowledge transfer. In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to large-sized models for object detection. We evaluate our approach on a variety of incremental settings defined on PASCAL-VOC and MS COCO datasets, demonstrating significant improvements over state-of-the-art.

* Preprint

Via

Access Paper or Ask Questions