Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Golnaz Ghiasi

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Dec 10, 2019

Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

Figure 1 for SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Figure 2 for SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Figure 3 for SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Figure 4 for SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Abstract:Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue that encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. SpineNet achieves state-of-the-art performance of one-stage object detector on COCO with 60% less computation, and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.

Via

Access Paper or Ask Questions

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

Dec 02, 2019

Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

Figure 1 for MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

Figure 2 for MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

Figure 3 for MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

Figure 4 for MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

Abstract:Despite the blooming success of architecture search for vision tasks in resource-constrained environments, the design of on-device object detection architectures have mostly been manual. The few automated search efforts are either centered around non-mobile-friendly search spaces or not guided by on-device latency. We propose Mnasfpn, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models. The learned Mnasfpn head, when paired with MobileNetV2 body, outperforms MobileNetV3+SSDLite by 1.8 mAP at similar latency on Pixel. It is also both 1.0 mAP more accurate and 10% faster than NAS-FPNLite. Ablation studies show that the majority of the performance gain comes from innovations in the search space. Further explorations reveal an interesting coupling between the search space design and the search algorithm, and that the complexity of Mnasfpn search space may be at a local optimum.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

Learning Data Augmentation Strategies for Object Detection

Jun 26, 2019

Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le

Figure 1 for Learning Data Augmentation Strategies for Object Detection

Figure 2 for Learning Data Augmentation Strategies for Object Detection

Figure 3 for Learning Data Augmentation Strategies for Object Detection

Figure 4 for Learning Data Augmentation Strategies for Object Detection

Abstract:Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this work, we study the impact of data augmentation on object detection. We first demonstrate that data augmentation operations borrowed from image classification may be helpful for training detection models, but the improvement is limited. Thus, we investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy. For example, the best augmentation policy identified with COCO improves a strong baseline on PASCAL-VOC by +2.7 mAP. Our results also reveal that a learned augmentation policy is superior to state-of-the-art architecture regularization methods for object detection, even when considering strong baselines. Code for training with the learned policy is available online at https://github.com/tensorflow/tpu/tree/master/models/official/detection

Via

Access Paper or Ask Questions

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Apr 16, 2019

Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le

Figure 1 for NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Figure 2 for NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Figure 3 for NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Figure 4 for NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Abstract:Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. The discovered architecture, named NAS-FPN, consists of a combination of top-down and bottom-up connections to fuse features across scales. NAS-FPN, combined with various backbone models in the RetinaNet framework, achieves better accuracy and latency tradeoff compared to state-of-the-art object detection models. NAS-FPN improves mobile detection accuracy by 2 AP compared to state-of-the-art SSDLite with MobileNetV2 model in [32] and achieves 48.3 AP which surpasses Mask R-CNN [10] detection accuracy with less computation time.

* Accepted at CVPR 2019

Via

Access Paper or Ask Questions

Adjustable Real-time Style Transfer

Nov 21, 2018

Mohammad Babaeizadeh, Golnaz Ghiasi

Figure 1 for Adjustable Real-time Style Transfer

Figure 2 for Adjustable Real-time Style Transfer

Figure 3 for Adjustable Real-time Style Transfer

Figure 4 for Adjustable Real-time Style Transfer

Abstract:Artistic style transfer is the problem of synthesizing an image with content similar to a given image and style similar to another. Although recent feed-forward neural networks can generate stylized images in real-time, these models produce a single stylization given a pair of style/content images, and the user doesn't have control over the synthesized output. Moreover, the style transfer depends on the hyper-parameters of the model with varying "optimum" for different input images. Therefore, if the stylized output is not appealing to the user, she/he has to try multiple models or retrain one with different hyper-parameters to get a favorite stylization. In this paper, we address these issues by proposing a novel method which allows adjustment of crucial hyper-parameters, after the training and in real-time, through a set of manually adjustable parameters. These parameters enable the user to modify the synthesized outputs from the same pair of style/content images, in search of a favorite stylized image. Our quantitative and qualitative experiments indicate how adjusting these parameters is comparable to retraining the model with different hyper-parameters. We also demonstrate how these parameters can be randomized to generate results which are diverse but still very similar in style and content.

Via

Access Paper or Ask Questions

DropBlock: A regularization method for convolutional networks

Oct 30, 2018

Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

Figure 1 for DropBlock: A regularization method for convolutional networks

Figure 2 for DropBlock: A regularization method for convolutional networks

Figure 3 for DropBlock: A regularization method for convolutional networks

Figure 4 for DropBlock: A regularization method for convolutional networks

Abstract:Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout. Although dropout is widely used as a regularization technique for fully connected layers, it is often less effective for convolutional layers. This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout. Thus a structured form of dropout is needed to regularize convolutional networks. In this paper, we introduce DropBlock, a form of structured dropout, where units in a contiguous region of a feature map are dropped together. We found that applying DropbBlock in skip connections in addition to the convolution layers increases the accuracy. Also, gradually increasing number of dropped units during training leads to better accuracy and more robust to hyperparameter choices. Extensive experiments show that DropBlock works better than dropout in regularizing convolutional networks. On ImageNet classification, ResNet-50 architecture with DropBlock achieves $78.13\%$ accuracy, which is more than $1.6\%$ improvement on the baseline. On COCO detection, DropBlock improves Average Precision of RetinaNet from $36.8\%$ to $38.4\%$.

* Accepted at NIPS 2018

Via

Access Paper or Ask Questions

Exploring the structure of a real-time, arbitrary neural artistic stylization network

Aug 24, 2017

Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens

Figure 1 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Figure 2 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Figure 3 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Figure 4 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Abstract:In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters directly from a style image. The model is successfully trained on a corpus of roughly 80,000 paintings and is able to generalize to paintings previously unobserved. We demonstrate that the learned embedding space is smooth and contains a rich structure and organizes semantic information associated with paintings in an entirely unsupervised manner.

* Accepted as an oral presentation at British Machine Vision Conference (BMVC) 2017

Via

Access Paper or Ask Questions

Occlusion Coherence: Detecting and Localizing Occluded Faces

Aug 25, 2016

Golnaz Ghiasi, Charless C. Fowlkes

Figure 1 for Occlusion Coherence: Detecting and Localizing Occluded Faces

Figure 2 for Occlusion Coherence: Detecting and Localizing Occluded Faces

Figure 3 for Occlusion Coherence: Detecting and Localizing Occluded Faces

Figure 4 for Occlusion Coherence: Detecting and Localizing Occluded Faces

Abstract:The presence of occluders significantly impacts object recognition accuracy. However, occlusion is typically treated as an unstructured source of noise and explicit models for occluders have lagged behind those for object appearance and shape. In this paper we describe a hierarchical deformable part model for face detection and landmark localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. We test the model on several benchmarks for landmark localization and detection including challenging new data sets featuring significant occlusion. We find that the addition of an explicit occlusion model yields a detection system that outperforms existing approaches for occluded instances while maintaining competitive accuracy in detection and landmark localization for unoccluded instances.

Via

Access Paper or Ask Questions

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Jul 30, 2016

Golnaz Ghiasi, Charless C. Fowlkes

Figure 1 for Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Figure 2 for Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Figure 3 for Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Figure 4 for Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Abstract:CNN architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixel-accurate labeling. This paper makes two contributions: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the high-dimensional feature representation contains significant sub-pixel localization information. (2) We describe a multi-resolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lower-resolution maps. This approach yields state-of-the-art semantic segmentation results on the PASCAL VOC and Cityscapes segmentation benchmarks without resorting to more complex random-field inference or instance detection driven architectures.

Via

Access Paper or Ask Questions