Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark

Sep 22, 2019
Ke Li, Gang Wan, Gong Cheng, Liqiu Meng, Junwei Han

Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories are small scale, and the image diversity and variations are insufficient. These limitations greatly affect the development of deep learning based object detection methods. In the paper, we provide a comprehensive review of the recent deep learning based object detection progress in both the computer vision and earth observation communities. Then, we propose a large-scale, publicly available benchmark for object DetectIon in Optical Remote sensing images, which we name as DIOR. The dataset contains 23463 images and 192472 instances, covering 20 object classes. The proposed DIOR dataset 1) is large-scale on the object categories, on the object instance number, and on the total image number; 2) has a large range of object size variations, not only in terms of spatial resolutions, but also in the aspect of inter- and intra-class size variability across objects; 3) holds big variations as the images are obtained with different imaging conditions, weathers, seasons, and image quality; and 4) has high inter-class similarity and intra-class diversity. The proposed benchmark can help the researchers to develop and validate their data-driven methods. Finally, we evaluate several state-of-the-art approaches on our DIOR dataset to establish a baseline for future research.

Via

Access Paper or Ask Questions

Clothing Co-Parsing by Joint Image Segmentation and Labeling

Feb 03, 2015
Wei Yang, Ping Luo, Liang Lin

Figure 1 for Clothing Co-Parsing by Joint Image Segmentation and Labeling

Figure 2 for Clothing Co-Parsing by Joint Image Segmentation and Labeling

Figure 3 for Clothing Co-Parsing by Joint Image Segmentation and Labeling

Figure 4 for Clothing Co-Parsing by Joint Image Segmentation and Labeling

This paper aims at developing an integrated system of clothing co-parsing, in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations. We propose a data-driven framework consisting of two phases of inference. The first phase, referred as "image co-segmentation", iterates to extract consistent regions on images and jointly refines the regions over all images by employing the exemplar-SVM (E-SVM) technique [23]. In the second phase (i.e. "region co-labeling"), we construct a multi-image graphical model by taking the segmented regions as vertices, and incorporate several contexts of clothing configuration (e.g., item location and mutual interactions). The joint label assignment can be solved using the efficient Graph Cuts algorithm. In addition to evaluate our framework on the Fashionista dataset [30], we construct a dataset called CCP consisting of 2098 high-resolution street fashion photos to demonstrate the performance of our system. We achieve 90.29% / 88.23% segmentation accuracy and 65.52% / 63.89% recognition rate on the Fashionista and the CCP datasets, respectively, which are superior compared with state-of-the-art methods.

* Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on , vol., no., pp.3182,3189, 23-28 June 2014
* 8 pages, 5 figures, CVPR 2014

Via

Access Paper or Ask Questions

AtLoc: Attention Guided Camera Localization

Sep 08, 2019
Bing Wang, Changhao Chen, Chris Xiaoxuan Lu, Peijun Zhao, Niki Trigoni, Andrew Markham

Figure 1 for AtLoc: Attention Guided Camera Localization

Figure 2 for AtLoc: Attention Guided Camera Localization

Figure 3 for AtLoc: Attention Guided Camera Localization

Figure 4 for AtLoc: Attention Guided Camera Localization

Deep learning has achieved impressive results in camera localization, but current single-image techniques typically suffer from a lack of robustness, leading to large outliers. To some extent, this has been tackled by sequential (multi-images) or geometry constraint approaches, which can learn to reject dynamic objects and illumination conditions to achieve better performance. In this work, we show that attention can be used to force the network to focus on more geometrically robust objects and features, achieving state-of-the-art performance in common benchmark, even if using only a single image as input. Extensive experimental evidence is provided through public indoor and outdoor datasets. Through visualization of the saliency maps, we demonstrate how the network learns to reject dynamic objects, yielding superior global camera pose regression performance. The source code is avaliable at https://github.com/BingCS/AtLoc.

Via

Access Paper or Ask Questions

Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Jun 13, 2019
Valery Vishnevskiy, Richard Rau, Orcun Goksel

Figure 1 for Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Figure 2 for Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Figure 3 for Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Figure 4 for Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Tomographic image reconstruction is relevant for many medical imaging modalities including X-ray, ultrasound (US) computed tomography (CT) and photoacoustics, for which the access to full angular range tomographic projections might be not available in clinical practice due to physical or time constraints. Reconstruction from incomplete data in low signal-to-noise ratio regime is a challenging and ill-posed inverse problem that usually leads to unsatisfactory image quality. While informative image priors may be learned using generic deep neural network architectures, the artefacts caused by an ill-conditioned design matrix often have global spatial support and cannot be efficiently filtered out by means of convolutions. In this paper we propose to learn an inverse mapping in an end-to-end fashion via unrolling optimization iterations of a prototypical reconstruction algorithm. We herein introduce a network architecture that performs filtering jointly in both sinogram and spatial domains. To efficiently train such deep network we propose a novel regularization approach based on deep exponential weighting. Experiments on US and X-ray CT data show that our proposed method is qualitatively and quantitatively superior to conventional non-linear reconstruction methods as well as state-of-the-art deep networks for image reconstruction. Fast inference time of the proposed algorithm allows for sophisticated reconstructions in real-time critical settings, demonstrated with US SoS imaging of an ex vivo bovine phantom.

* Accepted to MICCAI 2019

Via

Access Paper or Ask Questions

Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

May 21, 2016
Xingchao Peng, Judy Hoffman, Stella X. Yu, Kate Saenko

Figure 1 for Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

Figure 2 for Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

Figure 3 for Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

Figure 4 for Fine-to-coarse Knowledge Transfer For Low-Res Image Classification

We address the difficult problem of distinguishing fine-grained object categories in low resolution images. Wepropose a simple an effective deep learning approach that transfers fine-grained knowledge gained from high resolution training data to the coarse low-resolution test scenario. Such fine-to-coarse knowledge transfer has many real world applications, such as identifying objects in surveillance photos or satellite images where the image resolution at the test time is very low but plenty of high resolution photos of similar objects are available. Our extensive experiments on two standard benchmark datasets containing fine-grained car models and bird species demonstrate that our approach can effectively transfer fine-detail knowledge to coarse-detail imagery.

* 5 pages, accepted by ICIP 2016

Via

Access Paper or Ask Questions

Semi-supervised Learning using Adversarial Training with Good and Bad Samples

Oct 18, 2019
Wenyuan Li, Zichen Wang, Yuguang Yue, Jiayun Li, William Speier, Mingyuan Zhou, Corey W. Arnold

Figure 1 for Semi-supervised Learning using Adversarial Training with Good and Bad Samples

Figure 2 for Semi-supervised Learning using Adversarial Training with Good and Bad Samples

Figure 3 for Semi-supervised Learning using Adversarial Training with Good and Bad Samples

Figure 4 for Semi-supervised Learning using Adversarial Training with Good and Bad Samples

In this work, we investigate semi-supervised learning (SSL) for image classification using adversarial training. Previous results have illustrated that generative adversarial networks (GANs) can be used for multiple purposes. Triple-GAN, which aims to jointly optimize model components by incorporating three players, generates suitable image-label pairs to compensate for the lack of labeled data in SSL with improved benchmark performance. Conversely, Bad (or complementary) GAN, optimizes generation to produce complementary data-label pairs and force a classifier's decision boundary to lie between data manifolds. Although it generally outperforms Triple-GAN, Bad GAN is highly sensitive to the amount of labeled data used for training. Unifying these two approaches, we present unified-GAN (UGAN), a novel framework that enables a classifier to simultaneously learn from both good and bad samples through adversarial training. We perform extensive experiments on various datasets and demonstrate that UGAN: 1) achieves state-of-the-art performance among other deep generative models, and 2) is robust to variations in the amount of labeled data used for training.

Via

Access Paper or Ask Questions

Generalizable Cone Beam CT Esophagus Segmentation Using In Silico Data Augmentation

Jun 28, 2020
Sadegh R Alam, Tianfang Li, Si-Yuan Zhang, Pengpeng Zhang, Saad Nadeem

Figure 1 for Generalizable Cone Beam CT Esophagus Segmentation Using In Silico Data Augmentation

Figure 2 for Generalizable Cone Beam CT Esophagus Segmentation Using In Silico Data Augmentation

Figure 3 for Generalizable Cone Beam CT Esophagus Segmentation Using In Silico Data Augmentation

Figure 4 for Generalizable Cone Beam CT Esophagus Segmentation Using In Silico Data Augmentation

Lung cancer radiotherapy entails high quality planning computed tomography (pCT) imaging of the patient with radiation oncologist contouring of the tumor and the organs at risk (OARs) at the start of the treatment. This is followed by weekly low-quality cone beam CT (CBCT) imaging for treatment setup and qualitative visual assessment of tumor and critical OARs. In this work, we aim to make the weekly CBCT assessment quantitative by automatically segmenting the most critical OAR, esophagus, using deep learning and in silico (image-driven simulation) artifact induction to convert pCTs to pseudo-CBCTs (pCTs$+$artifacts). Specifically, for the in silico data augmentation, we make use of the critical insight that CT and CBCT have the same underlying physics and that it is easier to deteriorate the pCT to look more like CBCT (and use the accompanying high quality manual contours for segmentation) than to synthesize CT from CBCT where the critical anatomical information may have already been lost (which leads to anatomical hallucination with the prevalent generative adversarial networks for example). Given these pseudo-CBCTs and the high quality manual contours, we introduce a modified 3D-Unet architecture and a multi-objective loss function specifically designed for segmenting soft-tissue organs such as esophagus on real weekly CBCTs. The model achieved 0.74 dice overlap (against manual contours of an experienced radiation oncologist) on weekly CBCTs and was robust and generalizable enough to also produce state-of-the-art results on pCTs, achieving 0.77 dice overlap against the previous best of 0.72. This shows that our in silico data augmentation spans the realistic noise/artifact spectrum across patient CBCT/pCT data and can generalize well across modalities (without requiring retraining or domain adaptation), eventually improving the accuracy of treatment setup and response analysis.

* 13 pages, 7 figures

Via

Access Paper or Ask Questions

Large-scale representation learning from visually grounded untranscribed speech

Sep 19, 2019
Gabriel Ilharco, Yuan Zhang, Jason Baldridge

Figure 1 for Large-scale representation learning from visually grounded untranscribed speech

Figure 2 for Large-scale representation learning from visually grounded untranscribed speech

Figure 3 for Large-scale representation learning from visually grounded untranscribed speech

Figure 4 for Large-scale representation learning from visually grounded untranscribed speech

Systems that can associate images with their spoken audio captions are an important step towards visually grounded language learning. We describe a scalable method to automatically generate diverse audio for image captioning datasets. This supports pretraining deep networks for encoding both audio and images, which we do via a dual encoder that learns to align latent representations from both modalities. We show that a masked margin softmax loss for such models is superior to the standard triplet loss. We fine-tune these models on the Flickr8k Audio Captions Corpus and obtain state-of-the-art results---improving recall in the top 10 from 29.6% to 49.5%. We also obtain human ratings on retrieval outputs to better assess the impact of incidentally matching image-caption pairs that were not associated in the data, finding that automatic evaluation substantially underestimates the quality of the retrieved results.

* The SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2019

Via

Access Paper or Ask Questions

Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification

Apr 21, 2020
Wei Zhu, Haofu Liao, Wenbin Li, Weijian Li, Jiebo Luo

Figure 1 for Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification

Figure 2 for Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification

Figure 3 for Alleviating the Incompatibility between Cross Entropy Loss and Episode Training for Few-shot Skin Disease Classification

Skin disease classification from images is crucial to dermatological diagnosis. However, identifying skin lesions involves a variety of aspects in terms of size, color, shape, and texture. To make matters worse, many categories only contain very few samples, posing great challenges to conventional machine learning algorithms and even human experts. Inspired by the recent success of Few-Shot Learning (FSL) in natural image classification, we propose to apply FSL to skin disease identification to address the extreme scarcity of training sample problem. However, directly applying FSL to this task does not work well in practice, and we find that the problem can be largely attributed to the incompatibility between Cross Entropy (CE) and episode training, which are both commonly used in FSL. Based on a detailed analysis, we propose the Query-Relative (QR) loss, which proves superior to CE under episode training and is closely related to recently proposed mutual information estimation. Moreover, we further strengthen the proposed QR loss with a novel adaptive hard margin strategy. Comprehensive experiments validate the effectiveness of the proposed FSL scheme and the possibility to diagnosis rare skin disease with a few labeled samples.

Via

Access Paper or Ask Questions

3D Printed Brain-Controlled Robot-Arm Prosthetic via Embedded Deep Learning from sEMG Sensors

May 04, 2020
David Lonsdale, Li Zhang, Richard Jiang

Figure 1 for 3D Printed Brain-Controlled Robot-Arm Prosthetic via Embedded Deep Learning from sEMG Sensors

Figure 2 for 3D Printed Brain-Controlled Robot-Arm Prosthetic via Embedded Deep Learning from sEMG Sensors

Figure 3 for 3D Printed Brain-Controlled Robot-Arm Prosthetic via Embedded Deep Learning from sEMG Sensors

Figure 4 for 3D Printed Brain-Controlled Robot-Arm Prosthetic via Embedded Deep Learning from sEMG Sensors

In this paper, we present our work on developing robot arm prosthetic via deep learning. Our work proposes to use transfer learning techniques applied to the Google Inception model to retrain the final layer for surface electromyography (sEMG) classification. Data have been collected using the Thalmic Labs Myo Armband and used to generate graph images comprised of 8 subplots per image containing sEMG data captured from 40 data points per sensor, corresponding to the array of 8 sEMG sensors in the armband. Data captured were then classified into four categories (Fist, Thumbs Up, Open Hand, Rest) via using a deep learning model, Inception-v3, with transfer learning to train the model for accurate prediction of each on real-time input of new data. This trained model was then downloaded to the ARM processor based embedding system to enable the brain-controlled robot-arm prosthetic manufactured from our 3D printer. Testing of the functionality of the method, a robotic arm was produced using a 3D printer and off-the-shelf hardware to control it. SSH communication protocols are employed to execute python files hosted on an embedded Raspberry Pi with ARM processors to trigger movement on the robot arm of the predicted gesture.

* ICMLC 2020
* The 2020 International Conference on Machine Learning and Cybernetics (ICMLC), http://www.icmlc.com/

Via

Access Paper or Ask Questions