Face aging, which aims at aesthetically rendering a given face to predict its future appearance, has received significant research attention in recent years. Although great progress has been achieved with the success of Generative Adversarial Networks (GANs) in synthesizing realistic images, most existing GAN-based face aging methods have two main problems: 1) unnatural changes of high-level semantic information (e.g. facial attributes) due to the insufficient utilization of prior knowledge of input faces, and 2) distortions of low-level image content including ghosting artifacts and modifications in age-irrelevant regions. In this paper, we introduce A3GAN, an Attribute-Aware Attentive face aging model to address the above issues. Facial attribute vectors are regarded as the conditional information and embedded into both the generator and discriminator, encouraging synthesized faces to be faithful to attributes of corresponding inputs. To improve the visual fidelity of generation results, we leverage the attention mechanism to restrict modifications to age-related areas and preserve image details. Moreover, the wavelet packet transform is employed to capture textural features at multiple scales in the frequency space. Extensive experimental results demonstrate the effectiveness of our model in synthesizing photorealistic aged face images and achieving state-of-the-art performance on popular face aging datasets.
Every segmentation algorithm has parameters that need to be adjusted in order to achieve good results. Evolving fuzzy systems for adjustment of segmentation parameters have been proposed recently (Evolving fuzzy image segmentation -- EFIS [1]. However, similar to any other algorithm, EFIS too suffers from a few limitations when used in practice. As a major drawback, EFIS depends on detection of the object of interest for feature calculation, a task that is highly application-dependent. In this paper, a new version of EFIS is proposed to overcome these limitations. The new EFIS, called self-configuring EFIS (SC-EFIS), uses available training data to auto-configure the parameters that are fixed in EFIS. As well, the proposed SC-EFIS relies on a feature selection process that does not require the detection of a region of interest (ROI).
Local learning of sparse image models has proven to be very effective to solve inverse problems in many computer vision applications. To learn such models, the data samples are often clustered using the K-means algorithm with the Euclidean distance as a dissimilarity metric. However, the Euclidean distance may not always be a good dissimilarity measure for comparing data samples lying on a manifold. In this paper, we propose two algorithms for determining a local subset of training samples from which a good local model can be computed for reconstructing a given input test sample, where we take into account the underlying geometry of the data. The first algorithm, called Adaptive Geometry-driven Nearest Neighbor search (AGNN), is an adaptive scheme which can be seen as an out-of-sample extension of the replicator graph clustering method for local model learning. The second method, called Geometry-driven Overlapping Clusters (GOC), is a less complex nonadaptive alternative for training subset selection. The proposed AGNN and GOC methods are evaluated in image super-resolution, deblurring and denoising applications and shown to outperform spectral clustering, soft clustering, and geodesic distance based subset selection in most settings.
In this work, we present Detective - an attentive object detector that identifies objects in images in a sequential manner. Our network is based on an encoder-decoder architecture, where the encoder is a convolutional neural network, and the decoder is a convolutional recurrent neural network coupled with an attention mechanism. At each iteration, our decoder focuses on the relevant parts of the image using an attention mechanism, and then estimates the object's class and the bounding box coordinates. Current object detection models generate dense predictions and rely on post-processing to remove duplicate predictions. Detective is a sparse object detector that generates a single bounding box per object instance. However, training a sparse object detector is challenging, as it requires the model to reason at the instance level and not just at the class and spatial levels. We propose a training mechanism based on the Hungarian algorithm and a loss that balances the localization and classification tasks. This allows Detective to achieve promising results on the PASCAL VOC object detection dataset. Our experiments demonstrate that sparse object detection is possible and has a great potential for future developments in applications where the order of the objects to be predicted is of interest.
The hyperspectral image (HSI) unmixing task is essentially an inverse problem, which is commonly solved by optimization algorithms under a predefined (non-)linear mixture model. Although these optimization algorithms show impressive performance, they are very computational demanding as they often rely on an iterative updating scheme. Recently, the rise of neural networks has inspired lots of learning based algorithms in unmixing literature. However, most of them lack of interpretability and require a large training dataset. One natural question then arises: can one leverage the model based algorithm and learning based algorithm to achieve interpretable and fast algorithm for HSI unmixing problem? In this paper, we propose two novel network architectures, named U-ADMM-AENet and U-ADMM-BUNet, for abundance estimation and blind unmixing respectively, by combining the conventional optimization-model based unmixing method and the rising learning based unmixing method. We first consider a linear mixture model with sparsity constraint, then we unfold Alternating Direction Method of Multipliers (ADMM) algorithm to construct the unmixing network structures. We also show that the unfolded structures can find corresponding interpretations in machine learning literature, which further demonstrates the effectiveness of proposed methods. Benefit from the interpretation, the proposed networks can be initialized by incorporating prior information about the HSI data. Different from traditional unfolding networks, we propose a new training strategy for proposed networks to better fit in the HSI applications. Extensive experiments show that the proposed methods can achieve much faster convergence and competitive performance even with very small size of training data, when compared with state-of-art algorithms.
In this paper, we propose a complete framework to process images captured under uncontrolled lighting and especially under low lighting. By taking advantage of the Logarithmic Image Processing (LIP) context, we study two novel functional metrics: i) the LIP-multiplicative Asplund's metric which is robust to object absorption variations and ii) the LIP-additive Asplund's metric which is robust to variations of source intensity and exposure-time. We introduce robust to noise versions of these metrics. We demonstrate that the maps of their corresponding distances between an image and a reference template are linked to Mathematical Morphology. This facilitates their implementation. We assess them in various situations with different lightings and movements. Results show that those maps of distances are robust to lighting variations. Importantly, they are efficient to detect patterns in low-contrast images with a template acquired under a different lighting.
Segmentation-based image coding methods provide high compression ratios when compared with traditional image coding approaches like the transform and sub band coding for low bit-rate compression applications. In this paper, a segmentation-based image coding method, namely the Binary Space Partition scheme, that divides the desired image using a recursive procedure for coding is presented. The BSP approach partitions the desired image recursively by using bisecting lines, selected from a collection of discrete optional lines, in a hierarchical manner. This partitioning procedure generates a binary tree, which is referred to as the BSP-tree representation of the desired image. The algorithm is extremely complex in computation and has high execution time. The time complexity of the BSP scheme is explored in this work.
Direct methods have recently emerged as an effective and efficient tool in automated medical image analysis and become a trend to solve diverse challenging tasks in clinical practise. Compared to traditional methods, direct methods are of much more clinical significance by straightly targeting to the final clinical goal rather than relying on any intermediate steps. These intermediate steps, e.g., segmentation, registration and tracking, are actually not necessary and only limited to very constrained tasks far from being used in practical clinical applications; moreover they are computationally expensive and time-consuming, which causes a high waste of research resources. The advantages of direct methods stem from \textbf{1)} removal of intermediate steps, e.g., segmentation, tracking and registration; \textbf{2)} avoidance of user inputs and initialization; \textbf{3)} reformulation of conventional challenging problems, e.g., inversion problem, with efficient solutions.
Image denoising based on a probabilistic model of local image patches has been employed by various researchers, and recently a deep (denoising) autoencoder has been proposed by Burger et al. [2012] and Xie et al. [2012] as a good model for this. In this paper, we propose that another popular family of models in the field of deep learning, called Boltzmann machines, can perform image denoising as well as, or in certain cases of high level of noise, better than denoising autoencoders. We empirically evaluate the two models on three different sets of images with different types and levels of noise. Throughout the experiments we also examine the effect of the depth of the models. The experiments confirmed our claim and revealed that the performance can be improved by adding more hidden layers, especially when the level of noise is high.
Visual tracking is fundamentally the problem of regressing the state of the target in each video frame. While significant progress has been achieved, trackers are still prone to failures and inaccuracies. It is therefore crucial to represent the uncertainty in the target estimation. Although current prominent paradigms rely on estimating a state-dependent confidence score, this value lacks a clear probabilistic interpretation, complicating its use. In this work, we therefore propose a probabilistic regression formulation and apply it to tracking. Our network predicts the conditional probability density of the target state given an input image. Crucially, our formulation is capable of modeling label noise stemming from inaccurate annotations and ambiguities in the task. The regression network is trained by minimizing the Kullback-Leibler divergence. When applied for tracking, our formulation not only allows a probabilistic representation of the output, but also substantially improves the performance. Our tracker sets a new state-of-the-art on six datasets, achieving 59.8% AUC on LaSOT and 75.8% Success on TrackingNet. The code and models are available at https://github.com/visionml/pytracking.