University of Surrey, UK
Abstract:Recently, hashing techniques have gained importance in large-scale retrieval tasks because of their retrieval speed. Most of the existing cross-view frameworks assume that data are well paired. However, the fully-paired multiview situation is not universal in real applications. The aim of the method proposed in this paper is to learn the hashing function for semi-paired cross-view retrieval tasks. To utilize the label information of partial data, we propose a semi-supervised hashing learning framework which jointly performs feature extraction and classifier learning. The experimental results on two datasets show that our method outperforms several state-of-the-art methods in terms of retrieval accuracy.
Abstract:In the domain of pattern recognition, using the CovDs (Covariance Descriptors) to represent data and taking the metrics of the resulting Riemannian manifold into account have been widely adopted for the task of image set classification. Recently, it has been proven that infinite-dimensional CovDs are more discriminative than their low-dimensional counterparts. However, the form of infinite-dimensional CovDs is implicit and the computational load is high. We propose a novel framework for representing image sets by approximating infinite-dimensional CovDs in the paradigm of the Nystr\"om method based on a Riemannian kernel. We start by modeling the images via CovDs, which lie on the Riemannian manifold spanned by SPD (Symmetric Positive Definite) matrices. We then extend the Nystr\"om method to the SPD manifold and obtain the approximations of CovDs in RKHS (Reproducing Kernel Hilbert Space). Finally, we approximate infinite-dimensional CovDs via these approximations. Empirically, we apply our framework to the task of image set classification. The experimental results obtained on three benchmark datasets show that our proposed approximate infinite-dimensional CovDs outperform the original CovDs.
Abstract:In image set classification, a considerable advance has been made by modeling the original image sets by second order statistics or linear subspace, which typically lie on the Riemannian manifold. Specifically, they are Symmetric Positive Definite (SPD) manifold and Grassmann manifold respectively, and some algorithms have been developed on them for classification tasks. Motivated by the inability of existing methods to extract discriminatory features for data on Riemannian manifolds, we propose a novel algorithm which combines multiple manifolds as the features of the original image sets. In order to fuse these manifolds, the well-studied Riemannian kernels have been utilized to map the original Riemannian spaces into high dimensional Hilbert spaces. A metric Learning method has been devised to embed these kernel spaces into a lower dimensional common subspace for classification. The state-of-the-art results achieved on three datasets corresponding to two different classification tasks, namely face recognition and object categorization, demonstrate the effectiveness of the proposed method.
Abstract:In the domain of image-set based classification, a considerable advance has been made by representing original image sets as covariance matrices which typical lie in a Riemannian manifold. Specifically, it is a Symmetric Positive Definite (SPD) manifold. Traditional manifold learning methods inevitably have the property of high computational complexity or weak performance of the feature representation. In order to overcome these limitations, we propose a very simple Riemannian manifold network for image set classification. Inspired by deep learning architectures, we design a fully connected layer to generate more novel, more powerful SPD matrices. However we exploit the rectifying layer to prevent the input SPD matrices from being singular. We also introduce a non-linear learning of the proposed network with an innovative objective function. Furthermore we devise a pooling layer to further reduce the redundancy of the input SPD matrices, and the log-map layer to project the SPD manifold to the Euclidean space. For learning the connection weights between the input layer and the fully connected layer, we use Two-directional two-dimensional Principal Component Analysis ((2D)2PCA) algorithm. The proposed Riemannian manifold network (RieMNet) avoids complex computing and can be built and trained extremely easy and efficient. We have also developed a deep version of RieMNet, named as DRieMNet. The proposed RieMNet and DRieMNet are evaluated on three tasks: video-based face recognition, set-based object categorization, and set-based cell identification. Extensive experimental results show the superiority of our method over the state-of-the-art.
Abstract:In this paper, we propose a novel deep learning network L1-(2D)2PCANet for face recognition, which is based on L1-norm-based two-directional two-dimensional principal component analysis (L1-(2D)2PCA). In our network, the role of L1-(2D)2PCA is to learn the filters of multiple convolution layers. After the convolution layers, we deploy binary hashing and block-wise histogram for pooling. We test our network on some benchmark facial datasets YALE, AR, Extended Yale B, LFW-a and FERET with CNN, PCANet, 2DPCANet and L1-PCANet as comparison. The results show that the recognition performance of L1-(2D)2PCANet in all tests is better than baseline networks, especially when there are outliers in the test data. Owing to the L1-norm, L1-2D2PCANet is robust to outliers and changes of the training images.
Abstract:In recent years, deep learning has become a very active research tool which is used in many image processing fields. In this paper, we propose an effective image fusion method using a deep learning framework to generate a single image which contains all the features from infrared and visible images. First, the source images are decomposed into base parts and detail content. Then the base parts are fused by weighted-averaging. For the detail content, we use a deep learning network to extract multi-layer features. Using these features, we use l_1-norm and weighted-average strategy to generate several candidates of the fused detail content. Once we get these candidates, the max selection strategy is used to get final fused detail content. Finally, the fused image will be reconstructed by combining the fused base part and detail content. The experimental results demonstrate that our proposed method achieves state-of-the-art performance in both objective assessment and visual quality. The Code of our fusion method is available at https://github.com/exceptionLi/imagefusion_deeplearning
Abstract:This paper investigates the evaluation of dense 3D face reconstruction from a single 2D image in the wild. To this end, we organise a competition that provides a new benchmark dataset that contains 2000 2D facial images of 135 subjects as well as their 3D ground truth face scans. In contrast to previous competitions or challenges, the aim of this new benchmark dataset is to evaluate the accuracy of a 3D dense face reconstruction algorithm using real, accurate and high-resolution 3D ground truth face scans. In addition to the dataset, we provide a standard protocol as well as a Python script for the evaluation. Last, we report the results obtained by three state-of-the-art 3D face reconstruction systems on the new benchmark dataset. The competition is organised along with the 2018 13th IEEE Conference on Automatic Face & Gesture Recognition.
Abstract:We propose a novel end-to-end semi-supervised adversarial framework to generate photorealistic face images of new identities with wide ranges of expressions, poses, and illuminations conditioned by a 3D morphable model. Previous adversarial style-transfer methods either supervise their networks with large volume of paired data or use unpaired data with a highly under-constrained two-way generative framework in an unsupervised fashion. We introduce pairwise adversarial supervision to constrain two-way domain adaptation by a small number of paired real and synthetic images for training along with the large volume of unpaired data. Extensive qualitative and quantitative experiments are performed to validate our idea. Generated face images of new identities contain pose, lighting and expression diversity and qualitative results show that they are highly constraint by the synthetic input image while adding photorealism and retaining identity information. We combine face images generated by the proposed method with the real data set to train face recognition algorithms. We evaluated the model on two challenging data sets: LFW and IJB-A. We observe that the generated images from our framework consistently improves over the performance of deep face recognition network trained with Oxford VGG Face dataset and achieves comparable results to the state-of-the-art.
Abstract:In this paper we propose a new approach to person re-identification using images and natural language descriptions. We propose a joint vision and language model based on CCA and CNN architectures to match across the two modalities as well as to enrich visual examples for which there are no language descriptions. We also introduce new annotations in the form of natural language descriptions for two standard Re-ID benchmarks, namely CUHK03 and VIPeR. We perform experiments on these two datasets with techniques based on CNN, hand-crafted features as well as LSTM for analysing visual and natural description data. We investigate and demonstrate the advantages of using natural language descriptions compared to attributes as well as CNN compared to LSTM in the context of Re-ID. We show that the joint use of language and vision can significantly improve the state-of-the-art performance on standard Re-ID benchmarks.
Abstract:In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes.