This paper presents a novel quadratic projection based feature extraction framework, where a set of quadratic matrices is learned to distinguish each class from all other classes. We formulate quadratic matrix learning (QML) as a standard semidefinite programming (SDP) problem. However, the con- ventional interior-point SDP solvers do not scale well to the problem of QML for high-dimensional data. To solve the scalability of QML, we develop an efficient algorithm, termed DualQML, based on the Lagrange duality theory, to extract nonlinear features. To evaluate the feasibility and effectiveness of the proposed framework, we conduct extensive experiments on biometric recognition. Experimental results on three representative biometric recogni- tion tasks, including face, palmprint, and ear recognition, demonstrate the superiority of the DualQML-based feature extraction algorithm compared to the current state-of-the-art algorithms
Sparse representation has attracted much attention from researchers in fields of signal processing, image processing, computer vision and pattern recognition. Sparse representation also has a good reputation in both theoretical research and practical applications. Many different algorithms have been proposed for sparse representation. The main purpose of this article is to provide a comprehensive study and an updated review on sparse representation and to supply a guidance for researchers. The taxonomy of sparse representation methods can be studied from various viewpoints. For example, in terms of different norm minimizations used in sparsity constraints, the methods can be roughly categorized into five groups: sparse representation with $l_0$-norm minimization, sparse representation with $l_p$-norm (0$<$p$<$1) minimization, sparse representation with $l_1$-norm minimization and sparse representation with $l_{2,1}$-norm minimization. In this paper, a comprehensive overview of sparse representation is provided. The available sparse representation algorithms can also be empirically categorized into four groups: greedy strategy approximation, constrained optimization, proximity algorithm-based optimization, and homotopy algorithm-based sparse representation. The rationales of different algorithms in each category are analyzed and a wide range of sparse representation applications are summarized, which could sufficiently reveal the potential nature of the sparse representation theory. Specifically, an experimentally comparative study of these sparse representation algorithms was presented. The Matlab code used in this paper can be available at: http://www.yongxu.org/lunwen.html.
In this paper, we propose an effective scene text recognition method using sparse coding based features, called Histograms of Sparse Codes (HSC) features. For character detection, we use the HSC features instead of using the Histograms of Oriented Gradients (HOG) features. The HSC features are extracted by computing sparse codes with dictionaries that are learned from data using K-SVD, and aggregating per-pixel sparse codes to form local histograms. For word recognition, we integrate multiple cues including character detection scores and geometric contexts in an objective function. The final recognition results are obtained by searching for the words which correspond to the maximum value of the objective function. The parameters in the objective function are learned using the Minimum Classification Error (MCE) training method. Experiments on several challenging datasets demonstrate that the proposed HSC-based scene text recognition method outperforms HOG-based methods significantly and outperforms most state-of-the-art methods.
Image/video data is usually represented with multiple visual features. Fusion of multi-source information for establishing the attributes has been widely recognized. Multi-feature visual recognition has recently received much attention in multimedia applications. This paper studies visual understanding via a newly proposed l_2-norm based multi-feature shared learning framework, which can simultaneously learn a global label matrix and multiple sub-classifiers with the labeled multi-feature data. Additionally, a group graph manifold regularizer composed of the Laplacian and Hessian graph is proposed for better preserving the manifold structure of each feature, such that the label prediction power is much improved through the semi-supervised learning with global label consistency. For convenience, we call the proposed approach Global-Label-Consistent Classifier (GLCC). The merits of the proposed method include: 1) the manifold structure information of each feature is exploited in learning, resulting in a more faithful classification owing to the global label consistency; 2) a group graph manifold regularizer based on the Laplacian and Hessian regularization is constructed; 3) an efficient alternative optimization method is introduced as a fast solver owing to the convex sub-problems. Experiments on several benchmark visual datasets for multimedia understanding, such as the 17-category Oxford Flower dataset, the challenging 101-category Caltech dataset, the YouTube & Consumer Videos dataset and the large-scale NUS-WIDE dataset, demonstrate that the proposed approach compares favorably with the state-of-the-art algorithms. An extensive experiment on the deep convolutional activation features also show the effectiveness of the proposed approach. The code is available on http://www.escience.cn/people/lei/index.html
Deep learning with a convolutional neural network (CNN) has been proved to be very effective in feature extraction and representation of images. For image classification problems, this work aim at finding which classifier is more competitive based on high-level deep features of images. In this report, we have discussed the nearest neighbor, support vector machines and extreme learning machines for image classification under deep convolutional activation feature representation. Specifically, we adopt the benchmark object recognition dataset from multiple sources with domain bias for evaluating different classifiers. The deep features of the object dataset are obtained by a well-trained CNN with five convolutional layers and three fully-connected layers on the challenging ImageNet. Experiments demonstrate that the ELMs outperform SVMs in cross-domain recognition tasks. In particular, state-of-the-art results are obtained by kernel ELM which outperforms SVMs with about 4% of the average accuracy. The features and codes are available in http://www.escience.cn/people/lei/index.html
This paper addresses an important issue, known as sensor drift that behaves a nonlinear dynamic property in electronic nose (E-nose), from the viewpoint of machine learning. Traditional methods for drift compensation are laborious and costly due to the frequent acquisition and labeling process for gases samples recalibration. Extreme learning machines (ELMs) have been confirmed to be efficient and effective learning techniques for pattern recognition and regression. However, ELMs primarily focus on the supervised, semi-supervised and unsupervised learning problems in single domain (i.e. source domain). To our best knowledge, ELM with cross-domain learning capability has never been studied. This paper proposes a unified framework, referred to as Domain Adaptation Extreme Learning Machine (DAELM), which learns a robust classifier by leveraging a limited number of labeled data from target domain for drift compensation as well as gases recognition in E-nose systems, without loss of the computational efficiency and learning ability of traditional ELM. In the unified framework, two algorithms called DAELM-S and DAELM-T are proposed for the purpose of this paper, respectively. In order to percept the differences among ELM, DAELM-S and DAELM-T, two remarks are provided. Experiments on the popular sensor drift data with multiple batches collected by E-nose system clearly demonstrate that the proposed DAELM significantly outperforms existing drift compensation methods without cumbersome measures, and also bring new perspectives for ELM.
Distance metric learning aims to learn from the given training data a valid distance metric, with which the similarity between data samples can be more effectively evaluated for classification. Metric learning is often formulated as a convex or nonconvex optimization problem, while many existing metric learning algorithms become inefficient for large scale problems. In this paper, we formulate metric learning as a kernel classification problem, and solve it by iterated training of support vector machines (SVM). The new formulation is easy to implement, efficient in training, and tractable for large-scale problems. Two novel metric learning models, namely Positive-semidefinite Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the global optimality of their solutions. Experimental results on UCI dataset classification, handwritten digit recognition, face verification and person re-identification demonstrate that the proposed metric learning methods achieve higher classification accuracy than state-of-the-art methods and they are significantly more efficient in training.
By coding a query sample as a sparse linear combination of all training samples and then classifying it by evaluating which class leads to the minimal coding residual, sparse representation based classification (SRC) leads to interesting results for robust face recognition. It is widely believed that the l1- norm sparsity constraint on coding coefficients plays a key role in the success of SRC, while its use of all training samples to collaboratively represent the query sample is rather ignored. In this paper we discuss how SRC works, and show that the collaborative representation mechanism used in SRC is much more crucial to its success of face classification. The SRC is a special case of collaborative representation based classification (CRC), which has various instantiations by applying different norms to the coding residual and coding coefficient. More specifically, the l1 or l2 norm characterization of coding residual is related to the robustness of CRC to outlier facial pixels, while the l1 or l2 norm characterization of coding coefficient is related to the degree of discrimination of facial features. Extensive experiments were conducted to verify the face recognition accuracy and efficiency of CRC with different instantiations.
In this paper, we present a simple yet fast and robust algorithm which exploits the spatio-temporal context for visual tracking. Our approach formulates the spatio-temporal relationships between the object of interest and its local context based on a Bayesian framework, which models the statistical correlation between the low-level features (i.e., image intensity and position) from the target and its surrounding regions. The tracking problem is posed by computing a confidence map, and obtaining the best target location by maximizing an object location likelihood function. The Fast Fourier Transform is adopted for fast learning and detection in this work. Implemented in MATLAB without code optimization, the proposed tracker runs at 350 frames per second on an i7 machine. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods in terms of efficiency, accuracy and robustness.
Learning a distance metric from the given training samples plays a crucial role in many machine learning tasks, and various models and optimization algorithms have been proposed in the past decade. In this paper, we generalize several state-of-the-art metric learning methods, such as large margin nearest neighbor (LMNN) and information theoretic metric learning (ITML), into a kernel classification framework. First, doublets and triplets are constructed from the training samples, and a family of degree-2 polynomial kernel functions are proposed for pairs of doublets or triplets. Then, a kernel classification framework is established, which can not only generalize many popular metric learning methods such as LMNN and ITML, but also suggest new metric learning methods, which can be efficiently implemented, interestingly, by using the standard support vector machine (SVM) solvers. Two novel metric learning methods, namely doublet-SVM and triplet-SVM, are then developed under the proposed framework. Experimental results show that doublet-SVM and triplet-SVM achieve competitive classification accuracies with state-of-the-art metric learning methods such as ITML and LMNN but with significantly less training time.