Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Georgios B. Giannakis

Node Embedding with Adaptive Similarities for Scalable Learning over Graphs

Dec 03, 2018

Dimitris Berberidis, Georgios B. Giannakis

Figure 1 for Node Embedding with Adaptive Similarities for Scalable Learning over Graphs

Figure 2 for Node Embedding with Adaptive Similarities for Scalable Learning over Graphs

Figure 3 for Node Embedding with Adaptive Similarities for Scalable Learning over Graphs

Figure 4 for Node Embedding with Adaptive Similarities for Scalable Learning over Graphs

Abstract:Node embedding is the task of extracting informative and descriptive features over the nodes of a graph. The importance of node embeddings for graph analytics, as well as learning tasks such as node classification, link prediction and community detection, has led to increased interest on the problem leading to a number of recent advances. Much like PCA in the feature domain, node embedding is an inherently \emph{unsupervised} task; in lack of metadata used for validation, practical methods may require standardization and limiting the use of tunable hyperparameters. Finally, node embedding methods are faced with maintaining scalability in the face of large-scale real-world graphs of ever-increasing sizes. In the present work, we propose an adaptive node embedding framework that adjusts the embedding process to a given underlying graph, in a fully unsupervised manner. To achieve this, we adopt the notion of a tunable node similarity matrix that assigns weights on paths of different length. The design of the multilength similarities ensures that the resulting embeddings also inherit interpretable spectral properties. The proposed model is carefully studied, interpreted, and numerically evaluated using stochastic block models. Moreover, an algorithmic scheme is proposed for training the model parameters effieciently and in an unsupervised manner. We perform extensive node classification, link prediction, and clustering experiments on many real world graphs from various domains, and compare with state-of-the-art scalable and unsupervised node embedding alternatives. The proposed method enjoys superior performance in many cases, while also yielding interpretable information on the underlying structure of the graph.

Via

Access Paper or Ask Questions

Online Graph-Adaptive Learning with Scalability and Privacy

Dec 03, 2018

Yanning Shen, Geert Leus, Georgios B. Giannakis

Figure 1 for Online Graph-Adaptive Learning with Scalability and Privacy

Figure 2 for Online Graph-Adaptive Learning with Scalability and Privacy

Figure 3 for Online Graph-Adaptive Learning with Scalability and Privacy

Figure 4 for Online Graph-Adaptive Learning with Scalability and Privacy

Abstract:Graphs are widely adopted for modeling complex systems, including financial, biological, and social networks. Nodes in networks usually entail attributes, such as the age or gender of users in a social network. However, real-world networks can have very large size, and nodal attributes can be unavailable to a number of nodes, e.g., due to privacy concerns. Moreover, new nodes can emerge over time, which can necessitate real-time evaluation of their nodal attributes. In this context, the present paper deals with scalable learning of nodal attributes by estimating a nodal function based on noisy observations at a subset of nodes. A multikernel-based approach is developed which is scalable to large-size networks. Unlike most existing methods that re-solve the function estimation problem over all existing nodes whenever a new node joins the network, the novel method is capable of providing real-time evaluation of the function values on newly-joining nodes without resorting to a batch solver. Interestingly, the novel scheme only relies on an encrypted version of each node's connectivity in order to learn the nodal attributes, which promotes privacy. Experiments on both synthetic and real datasets corroborate the effectiveness of the proposed methods.

Via

Access Paper or Ask Questions

Graph Multiview Canonical Correlation Analysis

Nov 30, 2018

Jia Chen, Gang Wang, Georgios B. Giannakis

Figure 1 for Graph Multiview Canonical Correlation Analysis

Figure 2 for Graph Multiview Canonical Correlation Analysis

Figure 3 for Graph Multiview Canonical Correlation Analysis

Figure 4 for Graph Multiview Canonical Correlation Analysis

Abstract:Multiview canonical correlation analysis (MCCA) seeks latent low-dimensional representations encountered with multiview data of shared entities (a.k.a. common sources). However, existing MCCA approaches do not exploit the geometry of the common sources, which may be available \emph{a priori}, or can be constructed using certain domain knowledge. This prior information about the common sources can be encoded by a graph, and be invoked as a regularizer to enrich the maximum variance MCCA framework. In this context, the present paper's novel graph-regularized (G) MCCA approach minimizes the distance between the wanted canonical variables and the common low-dimensional representations, while accounting for graph-induced knowledge of the common sources. Relying on a function capturing the extent low-dimensional representations of the multiple views are similar, a generalization bound of GMCCA is established based on Rademacher's complexity. Tailored for setups where the number of data pairs is smaller than the data vector dimensions, a graph-regularized dual MCCA approach is also developed. To further deal with nonlinearities present in the data, graph-regularized kernel MCCA variants are put forward too. Interestingly, solutions of the graph-regularized linear, dual, and kernel MCCA, are all provided in terms of generalized eigenvalue decomposition. Several corroborating numerical tests using real datasets are provided to showcase the merits of the graph-regularized MCCA variants relative to several competing alternatives including MCCA, Laplacian-regularized MCCA, and (graph-regularized) PCA.

Via

Access Paper or Ask Questions

Real-time Power System State Estimation and Forecasting via Deep Neural Networks

Nov 30, 2018

Liang Zhang, Gang Wang, Georgios B. Giannakis

Figure 1 for Real-time Power System State Estimation and Forecasting via Deep Neural Networks

Figure 2 for Real-time Power System State Estimation and Forecasting via Deep Neural Networks

Figure 3 for Real-time Power System State Estimation and Forecasting via Deep Neural Networks

Figure 4 for Real-time Power System State Estimation and Forecasting via Deep Neural Networks

Abstract:Contemporary power grids are being challenged by rapid voltage fluctuations that are caused by large-scale deployment of renewable generation, electric vehicles, and demand response programs. In this context, monitoring the grid's operating conditions in real time becomes increasingly critical. With the emergent large scale and nonconvexity however, the existing power system state estimation (PSSE) schemes become computationally expensive or yield suboptimal performance. To bypass these hurdles, this paper advocates deep neural networks (DNNs) for real-time power system monitoring. By unrolling an iterative physics-based prox-linear solver, a novel model-specific DNN is developed for real-time PSSE with affordable training and minimal tuning effort. To further enable system awareness even ahead of the time horizon, as well as to endow the DNN-based estimator with resilience, deep recurrent neural networks (RNNs) are also pursued for power system state forecasting. Deep RNNs leverage the long-term nonlinear dependencies present in the historical voltage time series to enable forecasting, and they are easy to implement. Numerical tests showcase improved performance of the proposed DNN-based estimation and forecasting approaches compared with existing alternatives. In real load data experiments on the IEEE 118-bus benchmark system, the novel model-specific DNN-based PSSE scheme outperforms nearly by an order-of-magnitude the competing alternatives, including the widely adopted Gauss-Newton PSSE solver.

Via

Access Paper or Ask Questions

A Recurrent Graph Neural Network for Multi-Relational Data

Nov 19, 2018

Vassilis N. Ioannidis, Antonio G. Marques, Georgios B. Giannakis

Figure 1 for A Recurrent Graph Neural Network for Multi-Relational Data

Figure 2 for A Recurrent Graph Neural Network for Multi-Relational Data

Figure 3 for A Recurrent Graph Neural Network for Multi-Relational Data

Figure 4 for A Recurrent Graph Neural Network for Multi-Relational Data

Abstract:The era of data deluge has sparked the interest in graph-based learning methods in a number of disciplines such as sociology, biology, neuroscience, or engineering. In this paper, we introduce a graph recurrent neural network (GRNN) for scalable semi-supervised learning from multi-relational data. Key aspects of the novel GRNN architecture are the use of multi-relational graphs, the dynamic adaptation to the different relations via learnable weights, and the consideration of graph-based regularizers to promote smoothness and alleviate over-parametrization. Our ultimate goal is to design a powerful learning architecture able to: discover complex and highly non-linear data associations, combine (and select) multiple types of relations, and scale gracefully with respect to the size of the graph. Numerical tests with real data sets corroborate the design goals and illustrate the performance gains relative to competing alternatives.

* Submitted to ICASSP 2019

Via

Access Paper or Ask Questions

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Nov 09, 2018

Liping Li, Wei Xu, Tianyi Chen, Georgios B. Giannakis, Qing Ling

Figure 1 for RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Figure 2 for RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Figure 3 for RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Figure 4 for RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Abstract:In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.

* To appear in AAAI 2019

Via

Access Paper or Ask Questions

Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Oct 02, 2018

Jia Chen, Gang Wang, Georgios B. Giannakis

Figure 1 for Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Figure 2 for Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Figure 3 for Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Figure 4 for Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Abstract:Principal component analysis (PCA) is widely used for feature extraction and dimensionality reduction, with documented merits in diverse tasks involving high-dimensional data. Standard PCA copes with one dataset at a time, but it is challenged when it comes to analyzing multiple datasets jointly. In certain data science settings however, one is often interested in extracting the most discriminative information from one dataset of particular interest (a.k.a. target data) relative to the other(s) (a.k.a. background data). To this end, this paper puts forth a novel approach, termed discriminative (d) PCA, for such discriminative analytics of multiple datasets. Under certain conditions, dPCA is proved to be least-squares optimal in recovering the component vector unique to the target data relative to background data. To account for nonlinear data correlations, (linear) dPCA models for one or multiple background datasets are generalized through kernel-based learning. Interestingly, all dPCA variants admit an analytical solution obtainable with a single (generalized) eigenvalue decomposition. Finally, corroborating dimensionality reduction tests using both synthetic and real datasets are provided to validate the effectiveness of the proposed methods.

* there are some mistakes

Via

Access Paper or Ask Questions

Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

Sep 22, 2018

Vassilis N. Ioannidis, Ahmed S. Zamzam, Georgios B. Giannakis, Nicholas D. Sidiropoulos

Figure 1 for Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

Figure 2 for Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

Figure 3 for Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

Figure 4 for Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

Abstract:Joint analysis of data from multiple information repositories facilitates uncovering the underlying structure in heterogeneous datasets. Single and coupled matrix-tensor factorization (CMTF) has been widely used in this context for imputation-based recommendation from ratings, social network, and other user-item data. When this side information is in the form of item-item correlation matrices or graphs, existing CMTF algorithms may fall short. Alleviating current limitations, we introduce a novel model coined coupled graph-tensor factorization (CGTF) that judiciously accounts for graph-related side information. The CGTF model has the potential to overcome practical challenges, such as missing slabs from the tensor and/or missing rows/columns from the correlation matrices. A novel alternating direction method of multipliers (ADMM) is also developed that recovers the nonnegative factors of CGTF. Our algorithm enjoys closed-form updates that result in reduced computational complexity and allow for convergence claims. A novel direction is further explored by employing the interpretable factors to detect graph communities having the tensor as side information. The resulting community detection approach is successful even when some links in the graphs are missing. Results with real data sets corroborate the merits of the proposed methods relative to state-of-the-art competing factorization techniques in providing recommendations and detecting communities.

* This paper is submitted to the IEEE Transactions on Knowledge and Data Engineering. A preliminary version of this work was accepted for presentation in the special track of GlobalSIP on Tensor Methods for Signal Processing and Machine Learning

Via

Access Paper or Ask Questions

Adaptive Diffusions for Scalable Learning over Graphs

Sep 05, 2018

Dimitris Berberidis, Athanasios N. Nikolakopoulos, Georgios B. Giannakis

Figure 1 for Adaptive Diffusions for Scalable Learning over Graphs

Figure 2 for Adaptive Diffusions for Scalable Learning over Graphs

Figure 3 for Adaptive Diffusions for Scalable Learning over Graphs

Figure 4 for Adaptive Diffusions for Scalable Learning over Graphs

Abstract:Diffusion-based classifiers such as those relying on the Personalized PageRank and the Heat kernel, enjoy remarkable classification accuracy at modest computational requirements. Their performance however is affected by the extent to which the chosen diffusion captures a typically unknown label propagation mechanism, that can be specific to the underlying graph, and potentially different for each class. The present work introduces a disciplined, data-efficient approach to learning class-specific diffusion functions adapted to the underlying network topology. The novel learning approach leverages the notion of "landing probabilities" of class-specific random walks, which can be computed efficiently, thereby ensuring scalability to large graphs. This is supported by rigorous analysis of the properties of the model as well as the proposed algorithms. Furthermore, a robust version of the classifier facilitates learning even in noisy environments. Classification tests on real networks demonstrate that adapting the diffusion function to the given graph and observed labels, significantly improves the performance over fixed diffusions; reaching -- and many times surpassing -- the classification accuracy of computationally heavier state-of-the-art competing methods, that rely on node embeddings and deep neural networks.

Via

Access Paper or Ask Questions

Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Aug 14, 2018

Gang Wang, Georgios B. Giannakis, Jie Chen

Figure 1 for Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Figure 2 for Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Figure 3 for Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Figure 4 for Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Abstract:Neural networks with ReLU activations have achieved great empirical success in various domains. However, existing results for learning ReLU networks either pose assumptions on the underlying data distribution being e.g. Gaussian, or require the network size and/or training size to be sufficiently large. In this context, the problem of learning a two-layer ReLU network is approached in a binary classification setting, where the data are linearly separable and a hinge loss criterion is adopted. Leveraging the power of random noise, this contribution presents a novel stochastic gradient descent (SGD) algorithm, which can provably train any single-hidden-layer ReLU network to attain global optimality, despite the presence of infinitely many bad local minima and saddle points in general. This result is the first of its kind, requiring no assumptions on the data distribution, training/network size, or initialization. Convergence of the resultant iterative algorithm to a global minimum is analyzed by establishing both an upper bound and a lower bound on the number of effective (non-zero) updates to be performed. Furthermore, generalization guarantees are developed for ReLU networks trained with the novel SGD. These guarantees highlight a fundamental difference (at least in the worst case) between learning a ReLU network as well as a leaky ReLU network in terms of sample complexity. Numerical tests using synthetic data and real images validate the effectiveness of the algorithm and the practical merits of the theory.

* 23 pages, 5 figures, work in progress

Via

Access Paper or Ask Questions