As an effective way to integrate the information contained in multiple medical images under different modalities, medical image synthesis and fusion have emerged in various clinical applications such as disease diagnosis and treatment planning. In this paper, an invertible and variable augmented network (iVAN) is proposed for medical image synthesis and fusion. In iVAN, the channel number of the network input and output is the same through variable augmentation technology, and data relevance is enhanced, which is conducive to the generation of characterization information. Meanwhile, the invertible network is used to achieve the bidirectional inference processes. Due to the invertible and variable augmentation schemes, iVAN can not only be applied to the mappings of multi-input to one-output and multi-input to multi-output, but also be applied to one-input to multi-output. Experimental results demonstrated that the proposed method can obtain competitive or superior performance in comparison to representative medical image synthesis and fusion methods.
Deep learning-based medical image segmentation technology aims at automatic recognizing and annotating objects on the medical image. Non-local attention and feature learning by multi-scale methods are widely used to model network, which drives progress in medical image segmentation. However, those attention mechanism methods have weakly non-local receptive fields' strengthened connection for small objects in medical images. Then, the features of important small objects in abstract or coarse feature maps may be deserted, which leads to unsatisfactory performance. Moreover, the existing multi-scale methods only simply focus on different sizes of view, whose sparse multi-scale features collected are not abundant enough for small objects segmentation. In this work, a multi-dimensional attention segmentation model with cascade multi-scale convolution is proposed to predict accurate segmentation for small objects in medical images. As the weight function, multi-dimensional attention modules provide coefficient modification for significant/informative small objects features. Furthermore, The cascade multi-scale convolution modules in each skip-connection path are exploited to capture multi-scale features in different semantic depth. The proposed method is evaluated on three datasets: KiTS19, Pancreas CT of Decathlon-10, and MICCAI 2018 LiTS Challenge, demonstrating better segmentation performances than the state-of-the-art baselines.
A physics assisted deep learning framework to perform accurate indoor imaging using phaseless Wi-Fi measurements is proposed. It is able to image objects that are large (compared to wavelength) and have high permittivity values, that existing radio frequency (RF) inverse scattering techniques find very challenging, making it suitable for indoor RF imaging. The technique utilizes a Rytov based inverse scattering model with a deep learning framework. The inverse scattering model is based on an extended Rytov approximation (xRA) that pre-reconstructs the RF measurements. Under strong scattering conditions, this pre-reconstruction is related to the actual permittivity profile by a non-linear function, which is learned by a modified U-Net model to obtain the permittivity profile of the object. Thus, our proposed approach not only reconstructs the shape of objects, but also estimates their permittivity values accurately. We demonstrate its imaging performance using simulations as well as experimental results in an actual indoor environment using 2.4 GHz Wi-Fi phaseless measurements. For incident wavelength $\lambda_0$, the proposed framework can reconstruct objects with relative permittivity as high as 77 and electrical size as large as $40 \lambda$, where $\lambda =\lambda_0/\sqrt{77}$. This is in contrast to existing phaseless imaging techniques which cannot reconstruct permittivity values beyond 3 or 4. Thus, our proposed method is the first inverse scattering-based deep learning framework which can image large scatterers with high permittivity and achieve accurate indoor RF imaging using phaseless Wi-Fi measurements.
Identification of minerals in the field is a task that is wrought with many challenges. Traditional approaches are prone to errors where there is no enough experience and expertise. Several existing techniques mainly make use of features of the minerals under a microscope and tend to favour a manual feature extraction pipeline. Deep learning methods can help overcome some of these hurdles and provide simple and effective ways to identify minerals. In this paper, we present an algorithm for identifying minerals from hand specimen images. Using a Convolutional Neural Network (CNN), we develop a single-label image classification model to identify and categorise seven classes of minerals. Experiments conducted using real-world datasets show that the model achieves an accuracy of 90.75%.
Recent CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality. However, these methods are limited in the sense that they can be trained only on fully labeled data. Due to various challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data and hence, generalize poorly to real-world images. The use of real-world data in training image deraining networks is relatively less explored in the literature. We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset while generalizing better using unlabeled real-world images. Through extensive experiments and ablations on several challenging datasets (such as Rain800, Rain200H and DDN-SIRR), we show that the proposed method, when trained on limited labeled data, achieves on-par performance with fully-labeled training. Additionally, we demonstrate that using unlabeled real-world images in the proposed GP-based framework results in superior performance as compared to existing methods. Code is available at: https://github.com/rajeevyasarla/Syn2Real
Despite the recent developments in 3D Face Reconstruction from occluded and noisy face images, the performance is still unsatisfactory. One of the main challenges is to handle moderate to heavy occlusions in the face images. In addition, the noise in the face images inhibits the correct capture of facial attributes, thus needing to be reliably addressed. Moreover, most existing methods rely on additional dependencies, posing numerous constraints over the training procedure. Therefore, we propose a Self-Supervised RObustifying GUidancE (ROGUE) framework to obtain robustness against occlusions and noise in the face images. The proposed network contains 1) the Guidance Pipeline to obtain the 3D face coefficients for the clean faces, and 2) the Robustification Pipeline to acquire the consistency between the estimated coefficients for occluded or noisy images and the clean counterpart. The proposed image- and feature-level loss functions aid the ROGUE learning process without posing additional dependencies. On the three variations of the test dataset of CelebA: rational occlusions, delusional occlusions, and noisy face images, our method outperforms the current state-of-the-art method by large margins (e.g., for the shape-based 3D vertex errors, a reduction from 0.146 to 0.048 for rational occlusions, from 0.292 to 0.061 for delusional occlusions and from 0.269 to 0.053 for the noise in the face images), demonstrating the effectiveness of the proposed approach.
Bridging the semantic gap between image and question is an important step to improve the accuracy of the Visual Question Answering (VQA) task. However, most of the existing VQA methods focus on attention mechanisms or visual relations for reasoning the answer, while the features at different semantic levels are not fully utilized. In this paper, we present a new reasoning framework to fill the gap between visual features and semantic clues in the VQA task. Our method first extracts the features and predicates from the image and question. We then propose a new reasoning framework to effectively jointly learn these features and predicates in a coarse-to-fine manner. The intensively experimental results on three large-scale VQA datasets show that our proposed approach achieves superior accuracy comparing with other state-of-the-art methods. Furthermore, our reasoning framework also provides an explainable way to understand the decision of the deep neural network when predicting the answer.
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities using a training set of images annotated with labels belonging to a disjoint set of identities. Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level. Unlike fully unsupervised hierarchical clustering, the choice of grouping and complexity criteria stems naturally from supervision in the training set. The resulting method, Hi-LANDER, achieves an average of 54% improvement in F-score and 8% increase in Normalized Mutual Information (NMI) relative to current GNN-based clustering algorithms. Additionally, state-of-the-art GNN-based methods rely on separate models to predict linkage probabilities and node densities as intermediate steps of the clustering process. In contrast, our unified framework achieves a seven-fold decrease in computational cost. We release our training and inference code at https://github.com/dmlc/dgl/tree/master/examples/pytorch/hilander.
Visual noise is often regarded as a disturbance in image quality, whereas it can also provide a crucial clue for image-based forensic tasks. Conventionally, noise is assumed to comprise an additive Gaussian model to be estimated and then used to reveal anomalies. However, for real sensor noise, it should be modeled as signal-dependent noise (SDN). In this work, we apply SDN to splicing forgery localization tasks. Through statistical analysis of the SDN model, we assume that noise can be modeled as a Gaussian approximation for a certain brightness and propose a likelihood model for a noise level function. By building a maximum a posterior Markov random field (MAP-MRF) framework, we exploit the likelihood of noise to reveal the alien region of spliced objects, with a probability combination refinement strategy. To ensure a completely blind detection, an iterative alternating method is adopted to estimate the MRF parameters. Experimental results demonstrate that our method is effective and provides a comparative localization performance.
Knowledge distillation constitutes a simple yet effective way to improve the performance of a compact student network by exploiting the knowledge of a more powerful teacher. Nevertheless, the knowledge distillation literature remains limited to the scenario where the student and the teacher tackle the same task. Here, we investigate the problem of transferring knowledge not only across architectures but also across tasks. To this end, we study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework. In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance. Our experiments on several detectors with different backbones demonstrate the effectiveness of our approach, allowing us to outperform the state-of-the-art detector-to-detector distillation methods.