Inpainting is the technique of reconstructing unknown or damaged portions of an image in a visually plausible way. Inpainting algorithm automatically fills the damaged region in an image using the information available in undamaged region. Propagation of structure and texture information becomes a challenge as the size of damaged area increases. In this paper, a hierarchical inpainting algorithm using wavelets is proposed. The hierarchical method tries to keep the mask size smaller while wavelets help in handling the high pass structure information and low pass texture information separately. The performance of the proposed algorithm is tested using different factors. The results of our algorithm are compared with existing methods such as interpolation, diffusion and exemplar techniques.
Tubular structure tracking is an important and difficult problem in the fields of computer vision and medical image analysis. The minimal path models have exhibited its power in tracing tubular structures, by which a centerline can be naturally treated as a minimal path with a suitable geodesic metric. However, existing minimal path-based tubular structure tracing models still suffer from difficulty like the shortcuts and short branches combination problems, especially when dealing with the images with a complicated background. We introduce a new minima path-based model for minimally interactive tubular structure centerline extraction in conjunction with a perceptual grouping scheme. We take into account the prescribed tubular trajectories and the relevant curvature-penalized geodesic distances for minimal paths extraction in a graph-based optimization way. Experimental results on both synthetic and real images prove that the proposed model indeed obtains outperformance comparing to state-of-the-art minimal path-based tubular structure tracing algorithms.
Land cover mapping and monitoring are essential for understanding the environment and the effects of human activities on it. The automatic approaches to land cover mapping are predominantly based on the traditional machine learning that requires heuristic feature design. Such approaches are relatively slow and often suitable only for a particular type of satellite sensor or geographical area. Recently, deep learning has outperformed traditional machine learning approaches on a range of image processing tasks including image classification and segmentation. In this study, we demonstrated the suitability of deep learning models for wide-area land cover mapping using satellite C-band SAR images. We used a set of 14 ESA Sentinel-1 scenes acquired during the summer season in Finland representative of the land cover in the country. These imageries were used as an input to seven state-of-the-art deep-learning models for semantic segmentation, namely U-Net, DeepLabV3+, PSPNet, BiSeNet, SegNet, FC-DenseNet, and FRRN-B. These models were pre-trained on the ImageNet dataset and further fine-tuned in this study. CORINE land cover map produced by the Finnish Environment Institute was used as a reference, and the models were trained to distinguish between 5 Level-1 CORINE classes. Upon the evaluation and benchmarking, we found that all the models demonstrated solid performance, with the top FC-DenseNet model achieving an overall accuracy of 90.7%. These results indicate the suitability of deep learning methods to support efficient wide-area mapping using satellite SAR imagery.
Intersection of adversarial learning and satellite image processing is an emerging field in remote sensing. In this study, we intend to address synthesis of high resolution multi-spectral satellite imagery using adversarial learning. Guided by the discovery of attention mechanism, we regulate the process of band synthesis through spatio-spectral Laplacian attention. Further, we use Wasserstein GAN with gradient penalty norm to improve training and stability of adversarial learning. In this regard, we introduce a new cost function for the discriminator based on spatial attention and domain adaptation loss. We critically analyze the qualitative and quantitative results compared with state-of-the-art methods using widely adopted evaluation metrics. Our experiments on datasets of three different sensors, namely LISS-3, LISS-4, and WorldView-2 show that attention learning performs favorably against state-of-the-art methods. Using the proposed method we provide an additional data product in consistent with existing high resolution bands. Furthermore, we synthesize over 4000 high resolution scenes covering various terrains to analyze scientific fidelity. At the end, we demonstrate plausible large scale real world applications of the synthesized band.
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.
By developing sophisticated image priors or designing deep(er) architectures, a variety of image Super-Resolution (SR) approaches have been proposed recently and achieved very promising performance. A natural question that arises is whether these methods can be reformulated into a unifying framework and whether this framework assists in SR reconstruction? In this paper, we present a simple but effective single image SR method based on ensemble learning, which can produce a better performance than that could be obtained from any of SR methods to be ensembled (or called component super-resolvers). Based on the assumption that better component super-resolver should have larger ensemble weight when performing SR reconstruction, we present a Maximum A Posteriori (MAP) estimation framework for the inference of optimal ensemble weights. Specially, we introduce a reference dataset, which is composed of High-Resolution (HR) and Low-Resolution (LR) image pairs, to measure the super-resolution abilities (prior knowledge) of different component super-resolvers. To obtain the optimal ensemble weights, we propose to incorporate the reconstruction constraint, which states that the degenerated HR image should be equal to the LR observation one, as well as the prior knowledge of ensemble weights into the MAP estimation framework. Moreover, the proposed optimization problem can be solved by an analytical solution. We study the performance of the proposed method by comparing with different competitive approaches, including four state-of-the-art non-deep learning based methods, four latest deep learning based methods and one ensemble learning based method, and prove its effectiveness and superiority on three public datasets.
Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. But, compared to other clinical imaging modalities (e.g., CT or ultrasound), the data acquisition process for MRI is inherently slow. Furthermore, dynamic applications demand collecting a series of images in quick succession. As a result, reducing acquisition time and improving imaging quality for undersampled datasets have been active areas of research for the last two decades. The combination of parallel imaging and compressive sensing (CS) has been shown to benefit a wide range of MRI applications. More recently, deep learning techniques have been shown to outperform CS methods. Some of these techniques pose the MRI reconstruction as a direct inversion problem and tackle it by training a deep neural network (DNN) to map from the measured Fourier samples and the final image. Considering that the forward model in MRI changes from one dataset to the next, such methods have to be either trained over a large and diverse corpus of data or limited to a specific application, and even then they cannot ensure data consistency. An alternative is to use "plug-and-play" (PnP) algorithms, which iterate image denoising with forward-model based signal recovery. PnP algorithms are an excellent fit for compressive MRI because they decouple image modeling from the forward model, which can change significantly among different scans due to variations in the coil sensitivity maps, sampling patterns, and image resolution. Consequently, with PnP, state-of-the-art image-denoising techniques, such as those based on DNNs, can be directly exploited for compressive MRI image reconstruction. The objective of this article is two-fold: i) to review recent advances in plug-and-play methods, and ii) to discuss their application to compressive MRI image reconstruction.
The development of biometric recognition solutions able to work in visual surveillance conditions, i.e., in unconstrained data acquisition conditions and under covert protocols has been motivating growing efforts from the research community. Among the various laboratories, schools and research institutes concerned about this problem, the SOCIA: Soft Computing and Image Analysis Lab., of the University of Beira Interior, Portugal, has been among the most active in pursuing disruptive solutions for obtaining such extremely ambitious kind of automata. This report summarises the research works published by elements of the SOCIA Lab. in the last decade in the scope of biometric recognition in unconstrained conditions. The idea is that it can be used as basis for someone wishing to entering in this research topic.
Limited angle problem is a challenging issue in x-ray computed tomography (CT) field. Iterative reconstruction methods that utilize the additional prior can suppress artifacts and improve image quality, but unfortunately require increased computation time. An interesting way is to restrain the artifacts in the images reconstructed from the practical filtered back projection (FBP) method. Frikel and Quinto have proved that the streak artifacts in FBP results could be characterized. It indicates that the artifacts created by FBP method have specific and similar characteristics in a stationary limited-angle scanning configuration. Based on this understanding, this work aims at developing a method to extract and suppress specific artifacts of FBP reconstructions for limited-angle tomography. A data-driven learning-based method is proposed based on a deep convolutional neural network. An end-to-end mapping between the FBP and artifact-free images is learned and the implicit features involving artifacts will be extracted and suppressed via nonlinear mapping. The qualitative and quantitative evaluations of experimental results indicate that the proposed method show a stable and prospective performance on artifacts reduction and detail recovery for limited angle tomography. The presented strategy provides a simple and efficient approach for improving image quality of the reconstruction results from limited projection data.
Cross-scene model adaption is a crucial feature for camera relocalization applied in real scenarios. It is preferable that a pre-learned model can be quickly deployed in a novel scene with as little training as possible. The existing state-of-the-art approaches, however, can hardly support few-shot scene adaption due to the entangling of image feature extraction and 3D coordinate regression, which requires a large-scale of training data. To address this issue, inspired by how humans relocalize, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression and pose estimation are performed separately. Our key insight is that robust and discriminative image features used for coordinate regression should be learned by removing the distracting factor of camera views, because coordinates in the world reference frame are obviously independent of local views. In particular, we employ a deep neural network to learn view-factorized pixel-wise features using several training scenes. Given a new scene, we train a view-dependent per-pixel 3D coordinate regressor while keeping the feature extractor fixed. Such a decoupled design allows us to adapt the entire model to novel scene and achieve accurate camera pose estimation with only few-shot training samples and two orders of magnitude less training time than the state-of-the-arts.