Graph representation learning has achieved great success in many areas, including e-commerce, chemistry, biology, etc. However, the fundamental problem of choosing the appropriate dimension of node embedding for a given graph still remains unsolved. The commonly used strategies for Node Embedding Dimension Selection (NEDS) based on grid search or empirical knowledge suffer from heavy computation and poor model performance. In this paper, we revisit NEDS from the perspective of minimum entropy principle. Subsequently, we propose a novel Minimum Graph Entropy (MinGE) algorithm for NEDS with graph data. To be specific, MinGE considers both feature entropy and structure entropy on graphs, which are carefully designed according to the characteristics of the rich information in them. The feature entropy, which assumes the embeddings of adjacent nodes to be more similar, connects node features and link topology on graphs. The structure entropy takes the normalized degree as basic unit to further measure the higher-order structure of graphs. Based on them, we design MinGE to directly calculate the ideal node embedding dimension for any graph. Finally, comprehensive experiments with popular Graph Neural Networks (GNNs) on benchmark datasets demonstrate the effectiveness and generalizability of our proposed MinGE.
Graph embedding is essential for graph mining tasks. With the prevalence of graph data in real-world applications, many methods have been proposed in recent years to learn high-quality graph embedding vectors various types of graphs. However, most existing methods usually randomly select the negative samples from the original graph to enhance the training data without considering the noise. In addition, most of these methods only focus on the explicit graph structures and cannot fully capture complex semantics of edges such as various relationships or asymmetry. In order to address these issues, we propose a robust and generalized framework for adversarial graph embedding based on generative adversarial networks. Inspired by generative adversarial network, we propose a robust and generalized framework for adversarial graph embedding, named AGE. AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution, and enables the discriminator and generator to jointly learn each node's robust and generalized representation. Based on this framework, we propose three models to handle three types of graph data and derive the corresponding optimization algorithms, i.e., UG-AGE and DG-AGE for undirected and directed homogeneous graphs, respectively, and HIN-AGE for heterogeneous information networks. Extensive experiments show that our methods consistently and significantly outperform existing state-of-the-art methods across multiple graph mining tasks, including link prediction, node classification, and graph reconstruction.
With recent advances in data collection from multiple sources, multi-view data has received significant attention. In multi-view data, each view represents a different perspective of data. Since label information is often expensive to acquire, multi-view clustering has gained growing interest, which aims to obtain better clustering solution by exploiting complementary and consistent information across all views rather than only using an individual view. Due to inevitable sensor failures, data in each view may contain error. Error often exhibits as noise or feature-specific corruptions or outliers. Multi-view data may contain any or combination of these error types. Blindly clustering multi-view data i.e., without considering possible error in view(s) could significantly degrade the performance. The goal of error-robust multi-view clustering is to obtain useful outcome even if the multi-view data is corrupted. Existing error-robust multi-view clustering approaches with explicit error removal formulation can be structured into five broad research categories - sparsity norm based approaches, graph based methods, subspace based learning approaches, deep learning based methods and hybrid approaches, this survey summarizes and reviews recent advances in error-robust clustering for multi-view data. Finally, we highlight the challenges and provide future research opportunities.
Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in medical field. Two critical challenges are identified: Firstly, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Secondly, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. This paper aims to provide a generic Federated Multi-View Learning (FedMV) framework for multi-view data leakage prevention, which is based on different types of local data availability and enables to accommodate two types of problems: Vertical Federated Multi-View Learning (V-FedMV) and Horizontal Federated Multi-View Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. The results demonstrated that the proposed FedMV approach can make full use of multi-view data in a privacy-preserving way, and both V-FedMV and H-FedMV methods perform better than their single-view and pairwise counterparts. Besides, the proposed model can be easily adapted to deal with multi-view sequential data in a federated environment, which has been modeled and experimentally studied. To the best of our knowledge, this framework is the first to consider both vertical and horizontal diversification in the multi-view setting, as well as their sequential federated learning.
With the advance of the multi-media and multi-modal data, multi-view clustering (MVC) has drawn increasing attentions recently. In this field, one of the most crucial challenges is that the characteristics and qualities of different views usually vary extensively. Therefore, it is essential for MVC methods to find an effective approach that handles the diversity of multiple views appropriately. To this end, a series of MVC methods focusing on how to integrate the loss from each view have been proposed in the past few years. Among these methods, the mainstream idea is assigning weights to each view and then combining them linearly. In this paper, inspired by the effectiveness of non-linear combination in instance learning and the auto-weighted approaches, we propose Non-Linear Fusion for Self-Paced Multi-View Clustering (NSMVC), which is totally different from the the conventional linear-weighting algorithms. In NSMVC, we directly assign different exponents to different views according to their qualities. By this way, the negative impact from the corrupt views can be significantly reduced. Meanwhile, to address the non-convex issue of the MVC model, we further define a novel regularizer-free modality of Self-Paced Learning (SPL), which fits the proposed non-linear model perfectly. Experimental results on various real-world data sets demonstrate the effectiveness of the proposed method.
Graph neural networks (GNNs) have been widely used in deep learning on graphs. They can learn effective node representations that achieve superior performances in graph analysis tasks such as node classification and node clustering. However, most methods ignore the heterogeneity in real-world graphs. Methods designed for heterogeneous graphs, on the other hand, fail to learn complex semantic representations because they only use meta-paths instead of meta-graphs. Furthermore, they cannot fully capture the content-based correlations between nodes, as they either do not use the self-attention mechanism or only use it to consider the immediate neighbors of each node, ignoring the higher-order neighbors. We propose a novel Higher-order Attribute-Enhancing (HAE) framework that enhances node embedding in a layer-by-layer manner. Under the HAE framework, we propose a Higher-order Attribute-Enhancing Graph Neural Network (HAEGNN) for heterogeneous network representation learning. HAEGNN simultaneously incorporates meta-paths and meta-graphs for rich, heterogeneous semantics, and leverages the self-attention mechanism to explore content-based nodes interactions. The unique higher-order architecture of HAEGNN allows examining the first-order as well as higher-order neighborhoods. Moreover, HAEGNN shows good explainability as it learns the importances of different meta-paths and meta-graphs. HAEGNN is also memory-efficient, for it avoids per meta-path based matrix calculation. Experimental results not only show HAEGNN superior performance against the state-of-the-art methods in node classification, node clustering, and visualization, but also demonstrate its superiorities in terms of memory efficiency and explainability.
Events are happening in real-world and real-time, which can be planned and organized for occasions, such as social gatherings, festival celebrations, influential meetings or sports activities. Social media platforms generate a lot of real-time text information regarding public events with different topics. However, mining social events is challenging because events typically exhibit heterogeneous texture and metadata are often ambiguous. In this paper, we first design a novel event-based meta-schema to characterize the semantic relatedness of social events and then build an event-based heterogeneous information network (HIN) integrating information from external knowledge base. Second, we propose a novel Pairwise Popularity Graph Convolutional Network, named as PP-GCN, based on weighted meta-path instance similarity and textual semantic representation as inputs, to perform fine-grained social event categorization and learn the optimal weights of meta-paths in different tasks. Third, we propose a streaming social event detection and evolution discovery framework for HINs based on meta-path similarity search, historical information about meta-paths, and heterogeneous DBSCAN clustering method. Comprehensive experiments on real-world streaming social text data are conducted to compare various social event detection and evolution discovery algorithms. Experimental results demonstrate that our proposed framework outperforms other alternative social event detection and evolution discovery techniques.