In multi-label learning, the issue of missing labels brings a major challenge. Many methods attempt to recovery missing labels by exploiting low-rank structure of label matrix. However, these methods just utilize global low-rank label structure, ignore both local low-rank label structures and label discriminant information to some extent, leaving room for further performance improvement. In this paper, we develop a simple yet effective discriminant multi-label learning (DM2L) method for multi-label learning with missing labels. Specifically, we impose the low-rank structures on all the predictions of instances from the same labels (local shrinking of rank), and a maximally separated structure (high-rank structure) on the predictions of instances from different labels (global expanding of rank). In this way, these imposed low-rank structures can help modeling both local and global low-rank label structures, while the imposed high-rank structure can help providing more underlying discriminability. Our subsequent theoretical analysis also supports these intuitions. In addition, we provide a nonlinear extension via using kernel trick to enhance DM2L and establish a concave-convex objective to learn these models. Compared to the other methods, our method involves the fewest assumptions and only one hyper-parameter. Even so, extensive experiments show that our method still outperforms the state-of-the-art methods.
Unsupervised domain adaptation (UDA) is an emerging research topic in the field of machine learning and pattern recognition, which aims to help the learning of unlabeled target domain by transferring knowledge from the source domain. To perform UDA, a variety of methods have been proposed, most of which concentrate on the scenario of single source and single target domain (1S1T). However, in real applications, usually single source domain with multiple target domains are involved (1SmT), which cannot be handled directly by those 1S1T models. Unfortunately, although a few related works on 1SmT UDA have been proposed, nearly none of them model the source domain knowledge and leverage the target-relatedness jointly. To overcome these shortcomings, we herein propose a more general 1SmT UDA model through transferring both the Source-Knowledge and Target-Relatedness, UDA-SKTR for short. In this way, not only the supervision knowledge from the source domain, but also the potential relatedness among the target domains are simultaneously modeled for exploitation in the process of 1SmT UDA. In addition, we construct an alternating optimization algorithm to solve the variables of the proposed model with convergence guarantee. Finally, through extensive experiments on both benchmark and real datasets, we validate the effectiveness and superiority of the proposed method.
Domain adaptation (DA) is an emerging research topic in the field of machine learning and pattern recognition, which aims to assist the learning of target domains by transferring model knowledge from the source domains. To perform DA, a variety of methods have been proposed, most of which concentrate on the scenario of single source and single target domain (1S1T). However, in real applications, usually multiple domains, especially target domains, are involved, which cannot be handled directly by those 1S1T models. Although related works on multi-target domains have been proposed, they are quite rare, and more unfortunately, nearly none of them model the source domain knowledge and leverage the target-relatedness jointly. To overcome these shortcomings, in this paper we propose a kind of DA model through TrAnsferring both the source-KnowlEdge and TargEt-Relatedness, DATAKETER for short. In this way, not only the supervision knowledge from the source domain, but also the potential relatedness among the target domains are simultaneously modeled for exploitation in the process of 1SmT DA. In addition, we construct an alternating optimization algorithm to solve the variables of the proposed model with convergence guarantee. Finally, through extensive experiments on both benchmark and real datasets, we validate the effectiveness and superiority of the proposed method.
As a newly emerging unsupervised learning paradigm, self-supervised learning (SSL) recently gained widespread attention, which usually introduces a pretext task without manual annotation of data. With its help, SSL effectively learns the feature representation beneficial for downstream tasks. Thus the pretext task plays a key role. However, the study of its design, especially its essence currently is still open. In this paper, we borrow a multi-view perspective to decouple a class of popular pretext tasks into a combination of view data augmentation (VDA) and view label classification (VLC), where we attempt to explore the essence of such pretext task while providing some insights into its design. Specifically, a simple multi-view learning framework is specially designed (SSL-MV), which assists the feature learning of downstream tasks (original view) through the same tasks on the augmented views. SSL-MV focuses on VDA while abandons VLC, empirically uncovering that it is VDA rather than generally considered VLC that dominates the performance of such SSL. Additionally, thanks to replacing VLC with VDA tasks, SSL-MV also enables an integrated inference combining the predictions from the augmented views, further improving the performance. Experiments on several benchmark datasets demonstrate its advantages.
In the process of exploring the world, the curiosity constantly drives humans to cognize new things. Supposing you are a zoologist, for a presented animal image, you can recognize it immediately if you know its class. Otherwise, you would more likely attempt to cognize it by exploiting the side-information (e.g., semantic information, etc.) you have accumulated. Inspired by this, this paper decomposes the generalized zero-shot learning (G-ZSL) task into an open set recognition (OSR) task and a zero-shot learning (ZSL) task, where OSR recognizes seen classes (if we have seen (or known) them) and rejects unseen classes (if we have never seen (or known) them before), while ZSL identifies the unseen classes rejected by the former. Simultaneously, without violating OSR's assumptions (only known class knowledge is available in training), we also first attempt to explore a new generalized open set recognition (G-OSR) by introducing the accumulated side-information from known classes to OSR. For G-ZSL, such a decomposition effectively solves the class overfitting problem with easily misclassifying unseen classes as seen classes. The problem is ubiquitous in most existing G-ZSL methods. On the other hand, for G-OSR, introducing such semantic information of known classes not only improves the recognition performance but also endows OSR with the cognitive ability of unknown classes. Specifically, a visual and semantic prototypes-jointly guided convolutional neural network (VSG-CNN) is proposed to fulfill these two tasks (G-ZSL and G-OSR) in a unified end-to-end learning framework. Extensive experiments on benchmark datasets demonstrate the advantages of our learning framework.
Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of $O(1/T)$, where $T$ denotes the number of iterations. In particular, our methods not only reach the best convergence rate $O(1/T)$ for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.
Nowadays, multi-view clustering has attracted more and more attention. To date, almost all the previous studies assume that views are complete. However, in reality, it is often the case that each view may contain some missing instances. Such incompleteness makes it impossible to directly use traditional multi-view clustering methods. In this paper, we propose a Doubly Aligned Incomplete Multi-view Clustering algorithm (DAIMC) based on weighted semi-nonnegative matrix factorization (semi-NMF). Specifically, on the one hand, DAIMC utilizes the given instance alignment information to learn a common latent feature matrix for all the views. On the other hand, DAIMC establishes a consensus basis matrix with the help of $L_{2,1}$-Norm regularized regression for reducing the influence of missing instances. Consequently, compared with existing methods, besides inheriting the strength of semi-NMF with ability to handle negative entries, DAIMC has two unique advantages: 1) solving the incomplete view problem by introducing a respective weight matrix for each view, making it able to easily adapt to the case with more than two views; 2) reducing the influence of view incompleteness on clustering by enforcing the basis matrices of individual views being aligned with the help of regression. Experiments on four real-world datasets demonstrate its advantages.
Real data are often with multiple modalities or from multiple heterogeneous sources, thus forming so-called multi-view data, which receives more and more attentions in machine learning. Multi-view clustering (MVC) becomes its important paradigm. In real-world applications, some views often suffer from instances missing. Clustering on such multi-view datasets is called incomplete multi-view clustering (IMC) and quite challenging. To date, though many approaches have been developed, most of them are offline and have high computational and memory costs especially for large scale datasets. To address this problem, in this paper, we propose an One-Pass Incomplete Multi-view Clustering framework (OPIMC). With the help of regularized matrix factorization and weighted matrix factorization, OPIMC can relatively easily deal with such problem. Different from the existing and sole online IMC method, OPIMC can directly get clustering results and effectively determine the termination of iteration process by introducing two global statistics. Finally, extensive experiments conducted on four real datasets demonstrate the efficiency and effectiveness of the proposed OPIMC method.
Proximal gradient method has been playing an important role to solve many machine learning tasks, especially for the nonsmooth problems. However, in some machine learning problems such as the bandit model and the black-box learning problem, proximal gradient method could fail because the explicit gradients of these problems are difficult or infeasible to obtain. The gradient-free (zeroth-order) method can address these problems because only the objective function values are required in the optimization. Recently, the first zeroth-order proximal stochastic algorithm was proposed to solve the nonconvex nonsmooth problems. However, its convergence rate is $O(\frac{1}{\sqrt{T}})$ for the nonconvex problems, which is significantly slower than the best convergence rate $O(\frac{1}{T})$ of the zeroth-order stochastic algorithm, where $T$ is the iteration number. To fill this gap, in the paper, we propose a class of faster zeroth-order proximal stochastic methods with the variance reduction techniques of SVRG and SAGA, which are denoted as ZO-ProxSVRG and ZO-ProxSAGA, respectively. In theoretical analysis, we address the main challenge that an unbiased estimate of the true gradient does not hold in the zeroth-order case, which was required in previous theoretical analysis of both SVRG and SAGA. Moreover, we prove that both ZO-ProxSVRG and ZO-ProxSAGA algorithms have $O(\frac{1}{T})$ convergence rates. Finally, the experimental results verify that our algorithms have a faster convergence rate than the existing zeroth-order proximal stochastic algorithm.