Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hartwig Adam

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Aug 22, 2018

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam

Figure 1 for Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Figure 2 for Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Figure 3 for Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Figure 4 for Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Abstract:Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89.0\% and 82.1\% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at \url{https://github.com/tensorflow/models/tree/master/research/deeplab}.

* ECCV 2018 camera ready

Via

Access Paper or Ask Questions

InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity

Jul 17, 2018

Hee Jung Ryu, Hartwig Adam, Margaret Mitchell

Figure 1 for InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity

Figure 2 for InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity

Figure 3 for InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity

Figure 4 for InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity

Abstract:We demonstrate an approach to face attribute detection that retains or improves attribute detection accuracy across gender and race subgroups by learning demographic information prior to learning the attribute detection task. The system, which we call InclusiveFaceNet, detects face attributes by transferring race and gender representations learned from a held-out dataset of public race and gender identities. Leveraging learned demographic representations while withholding demographic inference from the downstream face attribute detection task preserves potential users' demographic privacy while resulting in some of the best reported numbers to date on attribute detection in the Faces of the World and CelebA datasets.

* Presented as a talk at the 2018 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018)

Via

Access Paper or Ask Questions

The iNaturalist Species Classification and Detection Dataset

Apr 10, 2018

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie

Figure 1 for The iNaturalist Species Classification and Detection Dataset

Figure 2 for The iNaturalist Species Classification and Detection Dataset

Figure 3 for The iNaturalist Species Classification and Detection Dataset

Figure 4 for The iNaturalist Species Classification and Detection Dataset

Abstract:Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist species classification and detection dataset, consisting of 859,000 images from over 5,000 different species of plants and animals. It features visually similar species, captured in a wide variety of situations, from all over the world. Images were collected with different camera types, have varying image quality, feature a large class imbalance, and have been verified by multiple citizen scientists. We discuss the collection of the dataset and present extensive baseline experiments using state-of-the-art computer vision classification and detection models. Results show that current non-ensemble based methods achieve only 67% top one classification accuracy, illustrating the difficulty of the dataset. Specifically, we observe poor results for classes with small numbers of training examples suggesting more attention is needed in low-shot learning.

* CVPR 2018

Via

Access Paper or Ask Questions

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Dec 15, 2017

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko

Figure 1 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Figure 2 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Figure 3 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Figure 4 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Abstract:The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

* 14 pages, 12 figures

Via

Access Paper or Ask Questions

MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Dec 13, 2017

Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, Hartwig Adam

Figure 1 for MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Figure 2 for MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Figure 3 for MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Figure 4 for MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Abstract:In this work, we tackle the problem of instance segmentation, the task of simultaneously solving object detection and semantic segmentation. Towards this goal, we present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction. Building on top of the Faster-RCNN object detector, the predicted boxes provide accurate localization of object instances. Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction. Semantic segmentation assists the model in distinguishing between objects of different semantic classes including background, while the direction prediction, estimating each pixel's direction towards its corresponding center, allows separating instances of the same semantic class. Moreover, we explore the effect of incorporating recent successful methods from both segmentation and detection (i.e. atrous convolution and hypercolumn). Our proposed model is evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models.

* 10 pages including reference

Via

Access Paper or Ask Questions

Rethinking Atrous Convolution for Semantic Image Segmentation

Dec 05, 2017

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam

Figure 1 for Rethinking Atrous Convolution for Semantic Image Segmentation

Figure 2 for Rethinking Atrous Convolution for Semantic Image Segmentation

Figure 3 for Rethinking Atrous Convolution for Semantic Image Segmentation

Figure 4 for Rethinking Atrous Convolution for Semantic Image Segmentation

Abstract:In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

* Add more experimental results

Via

Access Paper or Ask Questions

Learning Unified Embedding for Apparel Recognition

Aug 15, 2017

Yang Song, Yuan Li, Bo Wu, Chao-Yeh Chen, Xiao Zhang, Hartwig Adam

Figure 1 for Learning Unified Embedding for Apparel Recognition

Figure 2 for Learning Unified Embedding for Apparel Recognition

Figure 3 for Learning Unified Embedding for Apparel Recognition

Figure 4 for Learning Unified Embedding for Apparel Recognition

Abstract:In apparel recognition, specialized models (e.g. models trained for a particular vertical like dresses) can significantly outperform general models (i.e. models that cover a wide range of verticals). Therefore, deep neural network models are often trained separately for different verticals. However, using specialized models for different verticals is not scalable and expensive to deploy. This paper addresses the problem of learning one unified embedding model for multiple object verticals (e.g. all apparel classes) without sacrificing accuracy. The problem is tackled from two aspects: training data and training difficulty. On the training data aspect, we figure out that for a single model trained with triplet loss, there is an accuracy sweet spot in terms of how many verticals are trained together. To ease the training difficulty, a novel learning scheme is proposed by using the output from specialized models as learning targets so that L2 loss can be used instead of triplet loss. This new loss makes the training easier and make it possible for more efficient use of the feature space. The end result is a unified model which can achieve the same retrieval accuracy as a number of separate specialized models, while having the model complexity as one. The effectiveness of our approach is shown in experiments.

* 8 pages

Via

Access Paper or Ask Questions