Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image

Jan 08, 2019
Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, Jan Kautz

Figure 1 for PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image

Figure 2 for PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image

Figure 3 for PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image

Figure 4 for PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image

This paper proposes a deep neural architecture, PlaneRCNN, that detects and reconstructs piecewise planar surfaces from a single RGB image. PlaneRCNN employs a variant of Mask R-CNN to detect planes with their plane parameters and segmentation masks. PlaneRCNN then jointly refines all the segmentation masks with a novel loss enforcing the consistency with a nearby view during training. The paper also presents a new benchmark with more fine-grained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics. PlaneRCNN makes an important step towards robust plane extraction, which would have an immediate impact on a wide range of applications including Robotics, Augmented Reality, and Virtual Reality.

Via

Access Paper or Ask Questions

FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding

Dec 05, 2020
Maryam Rahnemoonfar, Tashnim Chowdhury, Argho Sarkar, Debvrat Varshney, Masoud Yari, Robin Murphy

Figure 1 for FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding

Figure 2 for FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding

Figure 3 for FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding

Figure 4 for FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding

Visual scene understanding is the core task in making any crucial decision in any computer vision system. Although popular computer vision datasets like Cityscapes, MS-COCO, PASCAL provide good benchmarks for several tasks (e.g. image classification, segmentation, object detection), these datasets are hardly suitable for post disaster damage assessments. On the other hand, existing natural disaster datasets include mainly satellite imagery which have low spatial resolution and a high revisit period. Therefore, they do not have a scope to provide quick and efficient damage assessment tasks. Unmanned Aerial Vehicle(UAV) can effortlessly access difficult places during any disaster and collect high resolution imagery that is required for aforementioned tasks of computer vision. To address these issues we present a high resolution UAV imagery, FloodNet, captured after the hurricane Harvey. This dataset demonstrates the post flooded damages of the affected areas. The images are labeled pixel-wise for semantic segmentation task and questions are produced for the task of visual question answering. FloodNet poses several challenges including detection of flooded roads and buildings and distinguishing between natural water and flooded water. With the advancement of deep learning algorithms, we can analyze the impact of any disaster which can make a precise understanding of the affected areas. In this paper, we compare and contrast the performances of baseline methods for image classification, semantic segmentation, and visual question answering on our dataset.

* 11 pages

Via

Access Paper or Ask Questions

Label-driven weakly-supervised learning for multimodal deformable image registration

Dec 24, 2017
Yipeng Hu, Marc Modat, Eli Gibson, Nooshin Ghavami, Ester Bonmati, Caroline M. Moore, Mark Emberton, J. Alison Noble, Dean C. Barratt, Tom Vercauteren

Spatially aligning medical images from different modalities remains a challenging task, especially for intraoperative applications that require fast and robust algorithms. We propose a weakly-supervised, label-driven formulation for learning 3D voxel correspondence from higher-level label correspondence, thereby bypassing classical intensity-based image similarity measures. During training, a convolutional neural network is optimised by outputting a dense displacement field (DDF) that warps a set of available anatomical labels from the moving image to match their corresponding counterparts in the fixed image. These label pairs, including solid organs, ducts, vessels, point landmarks and other ad hoc structures, are only required at training time and can be spatially aligned by minimising a cross-entropy function of the warped moving label and the fixed label. During inference, the trained network takes a new image pair to predict an optimal DDF, resulting in a fully-automatic, label-free, real-time and deformable registration. For interventional applications where large global transformation prevails, we also propose a neural network architecture to jointly optimise the global- and local displacements. Experiment results are presented based on cross-validating registrations of 111 pairs of T2-weighted magnetic resonance images and 3D transrectal ultrasound images from prostate cancer patients with a total of over 4000 anatomical labels, yielding a median target registration error of 4.2 mm on landmark centroids and a median Dice of 0.88 on prostate glands.

* Accepted to ISBI 2018

Via

Access Paper or Ask Questions

CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation

Feb 21, 2021
Soham Gadgil, Mark Endo, Emily Wen, Andrew Y. Ng, Pranav Rajpurkar

Figure 1 for CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation

Figure 2 for CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation

Figure 3 for CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation

Figure 4 for CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation

Medical image segmentation models are typically supervised by expert annotations at the pixel-level, which can be expensive to acquire. In this work, we propose a method that combines the high quality of pixel-level expert annotations with the scale of coarse DNN-generated saliency maps for training multi-label semantic segmentation models. We demonstrate the application of our semi-supervised method, which we call CheXseg, on multi-label chest x-ray interpretation. We find that CheXseg improves upon the performance (mIoU) of fully-supervised methods that use only pixel-level expert annotations by 13.4% and weakly-supervised methods that use only DNN-generated saliency maps by 91.2%. Furthermore, we implement a semi-supervised method using knowledge distillation and find that though it is outperformed by CheXseg, it exceeds the performance (mIoU) of the best fully-supervised method by 4.83%. Our best method is able to match radiologist agreement on three out of ten pathologies and reduces the overall performance gap by 71.6% as compared to weakly-supervised methods.

Via

Access Paper or Ask Questions

Variational Inference for Deblending Crowded Starfields

Feb 04, 2021
Runjing Liu, Jon D. McAuliffe, Jeffrey Regier

Figure 1 for Variational Inference for Deblending Crowded Starfields

Figure 2 for Variational Inference for Deblending Crowded Starfields

Figure 3 for Variational Inference for Deblending Crowded Starfields

Figure 4 for Variational Inference for Deblending Crowded Starfields

In the image data collected by astronomical surveys, stars and galaxies often overlap. Deblending is the task of distinguishing and characterizing individual light sources from survey images. We propose StarNet, a fully Bayesian method to deblend sources in astronomical images of crowded star fields. StarNet leverages recent advances in variational inference, including amortized variational distributions and the wake-sleep algorithm. Wake-sleep, which minimizes forward KL divergence, has significant benefits compared to traditional variational inference, which minimizes a reverse KL divergence. In our experiments with SDSS images of the M2 globular cluster, StarNet is substantially more accurate than two competing methods: Probablistic Cataloging (PCAT), a method that uses MCMC for inference, and a software pipeline employed by SDSS for deblending (DAOPHOT). In addition, StarNet is as much as $100,000$ times faster than PCAT, exhibiting the scaling characteristics necessary to perform fully Bayesian inference on modern astronomical surveys.

* 37 pages; 20 figures; 3 tables. Submitted to the Journal of the American Statistical Association

Via

Access Paper or Ask Questions

RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform

Apr 25, 2018
Jiayuan Li, Qingwu Hu, Mingyao Ai

Figure 1 for RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform

Figure 2 for RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform

Figure 3 for RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform

Figure 4 for RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform

Traditional feature matching methods such as scale-invariant feature transform (SIFT) usually use image intensity or gradient information to detect and describe feature points; however, both intensity and gradient are sensitive to nonlinear radiation distortions (NRD). To solve the problem, this paper proposes a novel feature matching algorithm that is robust to large NRD. The proposed method is called radiation-invariant feature transform (RIFT). There are three main contributions in RIFT: first, RIFT uses phase congruency (PC) instead of image intensity for feature point detection. RIFT considers both the number and repeatability of feature points, and detects both corner points and edge points on the PC map. Second, RIFT originally proposes a maximum index map (MIM) for feature description. MIM is constructed from the log-Gabor convolution sequence and is much more robust to NRD than traditional gradient map. Thus, RIFT not only largely improves the stability of feature detection, but also overcomes the limitation of gradient information for feature description. Third, RIFT analyzes the inherent influence of rotations on the values of MIM, and realizes rotation invariance. We use six different types of multi-model image datasets to evaluate RIFT, including optical-optical, infrared-optical, synthetic aperture radar (SAR)-optical, depth-optical, map-optical, and day-night datasets. Experimental results show that RIFT is much more superior to SIFT and SAR-SIFT. To the best of our knowledge, RIFT is the first feature matching algorithm that can achieve good performance on all the above-mentioned types of multi-model images. The source code of RIFT and multi-modal remote sensing image datasets are made public .

* 14 pages,17 figures

Via

Access Paper or Ask Questions

CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

Mar 12, 2021
Noranart Vesdapunt, Baoyuan Wang

Figure 1 for CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

Figure 2 for CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

Figure 3 for CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

Figure 4 for CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

Face detection is a fundamental problem for many downstream face applications, and there is a rising demand for faster, more accurate yet support for higher resolution face detectors. Recent smartphones can record a video in 8K resolution, but many of the existing face detectors still fail due to the anchor size and training data. We analyze the failure cases and observe a large number of correct predicted boxes with incorrect confidences. To calibrate these confidences, we propose a confidence ranking network with a pairwise ranking loss to re-rank the predicted confidences locally within the same image. Our confidence ranker is model-agnostic, so we can augment the data by choosing the pairs from multiple face detectors during the training, and generalize to a wide range of face detectors during the testing. On WiderFace, we achieve the highest AP on the single-scale, and our AP is competitive with the previous multi-scale methods while being significantly faster. On 8K resolution, our method solves the GPU memory issue and allows us to indirectly train on 8K. We collect 8K resolution test set to show the improvement, and we will release our test set as a new benchmark for future research.

* CVPR 2021

Via

Access Paper or Ask Questions

Efficient Near-Field Imaging Using Cylindrical MIMO Arrays

Jan 22, 2021
Shiyong Li, Shuoguang Wang, Moeness G. Amin, Guoqiang Zhao

Figure 1 for Efficient Near-Field Imaging Using Cylindrical MIMO Arrays

Figure 2 for Efficient Near-Field Imaging Using Cylindrical MIMO Arrays

Figure 3 for Efficient Near-Field Imaging Using Cylindrical MIMO Arrays

Figure 4 for Efficient Near-Field Imaging Using Cylindrical MIMO Arrays

Multiple-input multiple-output (MIMO) array based millimeter-wave (MMW) imaging has a tangible prospect in applications of concealed weapons detection. A near-field imaging algorithm based on wavenumber domain processing is proposed for a cylindrical MIMO array scheme with uniformly spaced transmit and receive antennas over both the vertical and horizontal-arc directions. The spectrum aliasing associated with the proposed MIMO array is analyzed through a zero-filling discrete-time Fourier transform. The analysis shows that an undersampled array can be used in recovering the MMW image by a wavenumber domain algorithm. The requirements for the antenna inter-element spacing of the MIMO array are delineated. Numerical simulations as well as comparisons with the backprojection (BP) algorithm are provided to demonstrate the effectiveness of the proposed method.

* 10 pages, 20 figures, paper submitted to IEEE Transactions on Aerospace and Electronic Systems

Via

Access Paper or Ask Questions

Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Oct 22, 2020
Zeyi Huang, Yang Zou, Vijayakumar Bhagavatula, Dong Huang

Figure 1 for Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Figure 2 for Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Figure 3 for Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Figure 4 for Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Weakly Supervised Object Detection (WSOD) has emerged as an effective tool to train object detectors using only the image-level category labels. However, without object-level labels, WSOD detectors are prone to detect bounding boxes on salient objects, clustered objects and discriminative object parts. Moreover, the image-level category labels do not enforce consistent object detection across different transformations of the same images. To address the above issues, we propose a Comprehensive Attention Self-Distillation (CASD) training approach for WSOD. To balance feature learning among all object instances, CASD computes the comprehensive attention aggregated from multiple transformations and feature layers of the same images. To enforce consistent spatial supervision on objects, CASD conducts self-distillation on the WSOD networks, such that the comprehensive attention is approximated simultaneously by multiple transformations and feature layers of the same images. CASD produces new state-of-the-art WSOD results on standard benchmarks such as PASCAL VOC 2007/2012 and MS-COCO.

* Neural Information Processing Systems (NeurIPS 2020)

Via

Access Paper or Ask Questions