In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers not only to accurately classify the seen classes, but also to effectively deal with the unseen ones. This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, experiment setup and evaluation metrics. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also overview the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.
In open set recognition (OSR), almost all existing methods are designed specially for recognizing individual instances, even these instances are collectively coming in batch. Recognizers in decision either reject or categorize them to some known class using empirically-set threshold. Thus the threshold plays a key role, however, the selection for it usually depends on the knowledge of known classes, inevitably incurring risks due to lacking available information from unknown classes. On the other hand, a more realistic OSR system should NOT just rest on a reject decision but should go further, especially for discovering the hidden unknown classes among the reject instances, whereas existing OSR methods do not pay special attention. In this paper, we introduce a novel collective/batch decision strategy with an aim to extend existing OSR for new class discovery while considering correlations among the testing instances. Specifically, a collective decision-based OSR framework (CD-OSR) is proposed by slightly modifying the Hierarchical Dirichlet process (HDP). Thanks to the HDP, our CD-OSR does not need to define the specific threshold and can automatically reserve space for unknown classes in testing, naturally resulting in a new class discovery function. Finally, extensive experiments on benchmark datasets indicate the validity of CD-OSR.
Unsupervised domain adaptation (UDA) aims to learn the unlabeled target domain by transferring the knowledge of the labeled source domain. To date, most of the existing works focus on the scenario of one source domain and one target domain (1S1T), and just a few works concern the scenario of multiple source domains and one target domain (mS1T). While, to the best of our knowledge, almost no work concerns the scenario of one source domain and multiple target domains (1SmT), in which these unlabeled target domains may not necessarily share the same categories, therefore, contrasting to mS1T, 1SmT is more challenging. Accordingly, for such a new UDA scenario, we propose a UDA framework through the model parameter adaptation (PA-1SmT). A key ingredient of PA-1SmT is to transfer knowledge through adaptive learning of a common model parameter dictionary, which is completely different from existing popular methods for UDA, such as subspace alignment, distribution matching etc., and can also be directly used for DA of privacy protection due to the fact that the knowledge is transferred just via the model parameters rather than data itself. Finally, our experimental results on three domain adaptation benchmark datasets demonstrate the superiority of our framework.
Feature missing is a serious problem in many applications, which may lead to low quality of training data and further significantly degrade the learning performance. While feature acquisition usually involves special devices or complex process, it is expensive to acquire all feature values for the whole dataset. On the other hand, features may be correlated with each other, and some values may be recovered from the others. It is thus important to decide which features are most informative for recovering the other features as well as improving the learning performance. In this paper, we try to train an effective classification model with least acquisition cost by jointly performing active feature querying and supervised matrix completion. When completing the feature matrix, a novel target function is proposed to simultaneously minimize the reconstruction error on observed entries and the supervised loss on training data. When querying the feature value, the most uncertain entry is actively selected based on the variance of previous iterations. In addition, a bi-objective optimization method is presented for cost-aware active selection when features bear different acquisition costs. The effectiveness of the proposed approach is well validated by both theoretical analysis and experimental study.
With the large rising of complex data, the nonconvex models such as nonconvex loss function and nonconvex regularizer are widely used in machine learning and pattern recognition. In this paper, we propose a class of mini-batch stochastic ADMMs (alternating direction method of multipliers) for solving large-scale nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch size, the mini-batch stochastic ADMM without variance reduction (VR) technique is convergent and reaches a convergence rate of $O(1/T)$ to obtain a stationary point of the nonconvex optimization, where $T$ denotes the number of iterations. Moreover, we extend the mini-batch stochastic gradient method to both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript \cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also reaches the convergence rate of $O(1/T)$ without condition on the mini-batch size. In particular, we provide a specific parameter selection for step size $\eta$ of stochastic gradients and penalty parameter $\rho$ of augmented Lagrangian function. Finally, extensive experimental results on both simulated and real-world data demonstrate the effectiveness of the proposed algorithms.
In the paper, we study the stochastic alternating direction method of multipliers (ADMM) for the nonconvex optimizations, and propose three classes of the nonconvex stochastic ADMM with variance reduction, based on different reduced variance stochastic gradients. Specifically, the first class called the nonconvex stochastic variance reduced gradient ADMM (SVRG-ADMM), uses a multi-stage scheme to progressively reduce the variance of stochastic gradients. The second is the nonconvex stochastic average gradient ADMM (SAG-ADMM), which additionally uses the old gradients estimated in the previous iteration. The third called SAGA-ADMM is an extension of the SAG-ADMM method. Moreover, under some mild conditions, we establish the iteration complexity bound of $O(1/\epsilon)$ of the proposed methods to obtain an $\epsilon$-stationary solution of the nonconvex optimizations. In particular, we provide a general framework to analyze the iteration complexity of these nonconvex stochastic ADMM methods with variance reduction. Finally, some numerical experiments demonstrate the effectiveness of our methods.
In this paper, we study the stochastic gradient descent (SGD) method for the nonconvex nonsmooth optimization, and propose an accelerated SGD method by combining the variance reduction technique with Nesterov's extrapolation technique. Moreover, based on the local error bound condition, we establish the linear convergence of our method to obtain a stationary point of the nonconvex optimization. In particular, we prove that not only the sequence generated linearly converges to a stationary point of the problem, but also the corresponding sequence of objective values is linearly convergent. Finally, some numerical experiments demonstrate the effectiveness of our method. To the best of our knowledge, it is first proved that the accelerated SGD method converges linearly to the local minimum of the nonconvex optimization.
In human face-based biometrics, gender classification and age estimation are two typical learning tasks. Although a variety of approaches have been proposed to handle them, just a few of them are solved jointly, even so, these joint methods do not yet specifically concern the semantic difference between human gender and age, which is intuitively helpful for joint learning, consequently leaving us a room of further improving the performance. To this end, in this work we firstly propose a general learning framework for jointly estimating human gender and age by specially attempting to formulate such semantic relationships as a form of near-orthogonality regularization and then incorporate it into the objective of the joint learning framework. In order to evaluate the effectiveness of the proposed framework, we exemplify it by respectively taking the widely used binary-class SVM for gender classification, and two threshold-based ordinal regression methods (i.e., the discriminant learning for ordinal regression and support vector ordinal regression) for age estimation, and crucially coupling both through the proposed semantic formulation. Moreover, we develop its kernelized nonlinear counterpart by deriving a representer theorem for the joint learning strategy. Finally, through extensive experiments on three aging datasets FG-NET, Morph Album I and Morph Album II, we demonstrate the effectiveness and superiority of the proposed joint learning strategy.
Human age estimation has attracted increasing researches due to its wide applicability in such as security monitoring and advertisement recommendation. Although a variety of methods have been proposed, most of them focus only on the age-specific facial appearance. However, biological researches have shown that not only gender but also the aging difference between the male and the female inevitably affect the age estimation. To our knowledge, so far there have been two methods that have concerned the gender factor. The first is a sequential method which first classifies the gender and then performs age estimation respectively for classified male and female. Although it promotes age estimation performance because of its consideration on the gender semantic difference, an accumulation risk of estimation errors is unavoidable. To overcome drawbacks of the sequential strategy, the second is to regress the age appended with the gender by concatenating their labels as two dimensional output using Partial Least Squares (PLS). Although leading to promotion of age estimation performance, such a concatenation not only likely confuses the semantics between the gender and age, but also ignores the aging discrepancy between the male and the female. In order to overcome their shortcomings, in this paper we propose a unified framework to perform gender-aware age estimation. The proposed method considers and utilizes not only the semantic relationship between the gender and the age, but also the aging discrepancy between the male and the female. Finally, experimental results demonstrate not only the superiority of our method in performance, but also its good interpretability in revealing the aging discrepancy.
One major challenge in computer vision is to go beyond the modeling of individual objects and to investigate the bi- (one-versus-one) or tri- (one-versus-two) relationship among multiple visual entities, answering such questions as whether a child in a photo belongs to given parents. The child-parents relationship plays a core role in a family and understanding such kin relationship would have fundamental impact on the behavior of an artificial intelligent agent working in the human world. In this work, we tackle the problem of one-versus-two (tri-subject) kinship verification and our contributions are three folds: 1) a novel relative symmetric bilinear model (RSBM) introduced to model the similarity between the child and the parents, by incorporating the prior knowledge that a child may resemble a particular parent more than the other; 2) a spatially voted method for feature selection, which jointly selects the most discriminative features for the child-parents pair, while taking local spatial information into account; 3) a large scale tri-subject kinship database characterized by over 1,000 child-parents families. Extensive experiments on KinFaceW, Family101 and our newly released kinship database show that the proposed method outperforms several previous state of the art methods, while could also be used to significantly boost the performance of one-versus-one kinship verification when the information about both parents are available.