Abstract:Existing nonnegative matrix factorization methods focus on learning global structure of the data to construct basis and coefficient matrices, which ignores the local structure that commonly exists among data. In this paper, we propose a new type of nonnegative matrix factorization method, which learns local similarity and clustering in a mutually enhancing way. The learned new representation is more representative in that it better reveals inherent geometric property of the data. Nonlinear expansion is given and efficient multiplicative updates are developed with theoretical convergence guarantees. Extensive experimental results have confirmed the effectiveness of the proposed model.
Abstract:In current clinical practice, electroencephalograms (EEG) are reviewed and analyzed by well-trained neurologists to provide supports for therapeutic decisions. The way of manual reviewing is labor-intensive and error prone. Automatic and accurate seizure/nonseizure classification methods are needed. One major problem is that the EEG signals for seizure state and nonseizure state exhibit considerable variations. In order to capture essential seizure features, this paper integrates an emerging deep learning model, the independently recurrent neural network (IndRNN), with a dense structure and an attention mechanism to exploit temporal and spatial discriminating features and overcome seizure variabilities. The dense structure is to ensure maximum information flow between layers. The attention mechanism is to capture spatial features. Evaluations are performed in cross-validation experiments over the noisy CHB-MIT data set. The obtained average sensitivity, specificity and precision of 88.80%, 88.60% and 88.69% are better than using the current state-of-the-art methods. In addition, we explore how the segment length affects the classification performance. Thirteen different segment lengths are assessed, showing that the classification performance varies over the segment lengths, and the maximal fluctuating margin is more than 4%. Thus, the segment length is an important factor influencing the classification performance.
Abstract:Robust principal component analysis (RPCA) has drawn significant attentions due to its powerful capability in recovering low-rank matrices as well as successful appplications in various real world problems. The current state-of-the-art algorithms usually need to solve singular value decomposition of large matrices, which generally has at least a quadratic or even cubic complexity. This drawback has limited the application of RPCA in solving real world problems. To combat this drawback, in this paper we propose a new type of RPCA method, RES-PCA, which is linearly efficient and scalable in both data size and dimension. For comparison purpose, AltProj, an existing scalable approach to RPCA requires the precise knowlwdge of the true rank; otherwise, it may fail to recover low-rank matrices. By contrast, our method works with or without knowing the true rank; even when both methods work, our method is faster. Extensive experiments have been performed and testified to the effectiveness of proposed method quantitatively and in visual quality, which suggests that our method is suitable to be employed as a light-weight, scalable component for RPCA in any application pipelines.
Abstract:We introduce a discriminative regression approach to supervised classification in this paper. It estimates a representation model while accounting for discriminativeness between classes, thereby enabling accurate derivation of categorical information. This new type of regression models extends existing models such as ridge, lasso, and group lasso through explicitly incorporating discriminative information. As a special case we focus on a quadratic model that admits a closed-form analytical solution. The corresponding classifier is called discriminative regression machine (DRM). Three iterative algorithms are further established for the DRM to enhance the efficiency and scalability for real applications. Our approach and the algorithms are applicable to general types of data including images, high-dimensional data, and imbalanced data. We compare the DRM with currently state-of-the-art classifiers. Our extensive experimental results show superior performance of the DRM and confirm the effectiveness of the proposed approach.
Abstract:In current clinical practices, electroencephalograms (EEG) are reviewed and analyzed by trained neurologists to provide supports for therapeutic decisions. Manual reviews can be laborious and error prone. Automatic and accurate seizure/non-seizure classification methods are desirable. A critical challenge is that seizure morphologies exhibit considerable variabilities. In order to capture essential seizure features, this paper leverages an emerging deep learning model, the independently recurrent neural network (IndRNN), to construct a new approach for the seizure/non-seizure classification. This new approach gradually expands the time scales, thereby extracting temporal and spatial features from the local time duration to the entire record. Evaluations are conducted with cross-validation experiments across subjects over the noisy data of CHB-MIT. Experimental results demonstrate that the proposed approach outperforms the current state-of-the-art methods. In addition, we explore how the segment length affects the classification performance. Thirteen different segment lengths are assessed, showing that the classification performance varies over the segment lengths, and the maximal fluctuating margin is more than 4%. Thus, the segment length is an important factor influencing the classification performance.
Abstract:In this paper, we propose an auto-encoder based generative neural network model whose encoder compresses the inputs into vectors in the tangent space of a special Lie group manifold: upper triangular positive definite affine transform matrices (UTDATs). UTDATs are representations of Gaussian distributions and can straightforwardly generate Gaussian distributed samples. Therefore, the encoder is trained together with a decoder (generator) which takes Gaussian distributed latent vectors as input. Compared with related generative models such as variational auto-encoder, the proposed model incorporates the information on geometric properties of Gaussian distributions. As a special case, we derive an exponential mapping layer for diagonal Gaussian UTDATs which eliminates matrix exponential operator compared with general exponential mapping in Lie group theory. Moreover, we derive an intrinsic loss for UTDAT Lie group which can be calculated as l-2 loss in the tangent space. Furthermore, inspired by the Lie group theory, we propose to use the Lie algebra vectors rather than the raw parameters (e.g. mean) of Gaussian distributions as compressed representations of original inputs. Experimental results verity the effectiveness of the proposed new generative model and the benefits gained from the Lie group structural information of UTDATs.
Abstract:Edge features contain important information about graphs. However, current state-of-the-art neural network models designed for graph learning do not consider incorporating edge features, especially multi-dimensional edge features. In this paper, we propose an attention mechanism which combines both node features and edge features. Guided by the edge features, the attention mechanism on a pair of graph nodes will not only depend on node contents, but also ajust automatically with respect to the properties of the edge connecting these two nodes. Moreover, the edge features are adjusted by the attention function and fed to the next layer, which means our edge features are adaptive across network layers. As a result, our proposed adaptive edge features guided graph attention model can consolidate a rich source of graph information that current state-of-the-art graph learning methods cannot. We apply our proposed model to graph node classification, and experimental results on three citaion network datasets and a biological network dataset show that out method outperforms the current state-of-the-art methods, testifying to the discriminative capability of edge features and the effectiveness of our adaptive edge features guided attention model. Additional ablation experimental study further shows that both the edge features and adaptiveness components contribute to our model.
Abstract:Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the discrete one. Finally, those three subtasks are integrated into a unified framework, with each subtask iteratively boosted by using the results of the others towards an overall optimal solution. It is known that the performance of a kernel method is largely determined by the choice of kernels. To tackle this practical problem of how to select the most suitable kernel for a particular data set, we further extend our model to incorporate multiple kernel learning ability. Extensive experiments demonstrate the superiority of our proposed method as compared to existing clustering approaches.
Abstract:Many similarity-based clustering methods work in two separate steps including similarity matrix computation and subsequent spectral clustering. However, similarity measurement is challenging because it is usually impacted by many factors, e.g., the choice of similarity metric, neighborhood size, scale of data, noise and outliers. Thus the learned similarity matrix is often not suitable, let alone optimal, for the subsequent clustering. In addition, nonlinear similarity often exists in many real world data which, however, has not been effectively considered by most existing methods. To tackle these two challenges, we propose a model to simultaneously learn cluster indicator matrix and similarity information in kernel spaces in a principled way. We show theoretical relationships to kernel k-means, k-means, and spectral clustering methods. Then, to address the practical issue of how to select the most suitable kernel for a particular clustering task, we further extend our model with a multiple kernel learning ability. With this joint model, we can automatically accomplish three subtasks of finding the best cluster indicator matrix, the most accurate similarity relations and the optimal combination of multiple kernels. By leveraging the interactions between these three subtasks in a joint framework, each subtask can be iteratively boosted by using the results of the others towards an overall optimal solution. Extensive experiments are performed to demonstrate the effectiveness of our method.
Abstract:Recommender systems play an increasingly important role in online applications to help users find what they need or prefer. Collaborative filtering algorithms that generate predictions by analyzing the user-item rating matrix perform poorly when the matrix is sparse. To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix. The proposed method constructs a new representation which preserves affinity and structure information in the user-item rating matrix and then performs recommendation task. To capture proximity information about users and items, two graphs are constructed. Manifold learning idea is used to constrain the new representation to be smooth on these graphs, so as to enforce users and item proximities. Our model is formulated as a convex optimization problem, for which we need to solve the well-known Sylvester equation only. We carry out extensive empirical evaluations on six benchmark datasets to show the effectiveness of this approach.