Given a graph over which the contagions (e.g. virus, gossip) propagate, leveraging subgraphs with highly correlated nodes is beneficial to many applications. Yet, challenges abound. First, the propagation pattern between a pair of nodes may change in various temporal dimensions. Second, not always the same contagion is propagated. Hence, state-of-the-art text mining approaches ranging from similarity measures to topic-modeling cannot use the textual contents to compute the weights between the nodes. Third, the word-word co-occurrence patterns may differ in various temporal dimensions, which increases the difficulty to employ current word embedding approaches. We argue that inseparable multi-aspect temporal collaborations are inevitably needed to better calculate the correlation metrics in dynamical processes. In this work, we showcase a sophisticated framework that on the one hand, integrates a neural network based time-aware word embedding component that can collectively construct the word vectors through an assembly of infinite latent temporal facets, and on the other hand, uses an elaborate generative model to compute the edge weights through heterogeneous temporal attributes. After computing the intra-nodes weights, we utilize our Max-Heap Graph cutting algorithm to exploit subgraphs. We then validate our model through comprehensive experiments on real-world propagation data. The results show that the knowledge gained from the versatile temporal dynamics is not only indispensable for word embedding approaches but also plays a significant role in the understanding of the propagation behaviors. Finally, we demonstrate that compared with other rivals, our model can dominantly exploit the subgraphs with highly coordinated nodes.
We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth (why do DNNs perform better than shallow models?) and the interpretation of DNNs (what do intermediate layers do?) Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth or the hidden layer number, and it is specified by an ordinary differential equation with a vector field. We interpret an ordinary DNN as a transport map or a Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE). Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution.
How much can pruning algorithms teach us about the fundamentals of learning representations in neural networks? And how much can these fundamentals help while devising new pruning techniques? A lot, it turns out. Neural network pruning has become a topic of great interest in recent years, and many different techniques have been proposed to address this problem. The decision of what to prune and when to prune necessarily forces us to confront our assumptions about how neural networks actually learn to represent patterns in data. In this work, we set out to test several long-held hypotheses about neural network learning representations, approaches to pruning and the relevance of one in the context of the other. To accomplish this, we argue in favor of pruning whole neurons as opposed to the traditional method of pruning weights from optimally trained networks. We first review the historical literature, point out some common assumptions it makes, and propose methods to demonstrate the inherent flaws in these assumptions. We then propose our novel approach to pruning and set about analyzing the quality of the decisions it makes. Our analysis led us to question the validity of many widely-held assumptions behind pruning algorithms and the trade-offs we often make in the interest of reducing computational complexity. We discovered that there is a straightforward way, however expensive, to serially prune 40-70% of the neurons in a trained network with minimal effect on the learning representation and without any re-training. It is to be noted here that the motivation behind this work is not to propose an algorithm that would outperform all existing methods, but to shed light on what some inherent flaws in these methods can teach us about learning representations and how this can lead us to superior pruning techniques.
Dictionary learning is a challenge topic in many image processing areas. The basic goal is to learn a sparse representation from an overcomplete basis set. Due to combining the advantages of generic multiscale representations with learning based adaptivity, multiscale dictionary representation approaches have the power in capturing structural characteristics of natural images. However, existing multiscale learning approaches still suffer from three main weaknesses: inadaptability to diverse scales of image data, sensitivity to noise and outliers, difficulty to determine optimal dictionary structure. In this paper, we present a novel multiscale dictionary learning paradigm for sparse image representations based on an improved empirical mode decomposition. This powerful data-driven analysis tool for multi-dimensional signal can fully adaptively decompose the image into multiscale oscillating components according to intrinsic modes of data self. This treatment can obtain a robust and effective sparse representation, and meanwhile generates a raw base dictionary at multiple geometric scales and spatial frequency bands. This dictionary is refined by selecting optimal oscillating atoms based on frequency clustering. In order to further enhance sparsity and generalization, a tolerance dictionary is learned using a coherence regularized model. A fast proximal scheme is developed to optimize this model. The multiscale dictionary is considered as the product of oscillating dictionary and tolerance dictionary. Experimental results demonstrate that the proposed learning approach has the superior performance in sparse image representations as compared with several competing methods. We also show the promising results in image denoising application.
As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus. Phrase mining is important in various tasks such as information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. Recently, a few data-driven methods have been developed successfully for extraction of phrases from massive domain-specific text. However, none of the state-of-the-art models is fully automated because they require human experts for designing rules or labeling phrases. Since one can easily obtain many quality phrases from public knowledge bases to a scale that is much larger than that produced by human experts, in this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which leverages this large amount of high-quality phrases in an effective way and achieves better performance compared to limited human labeled phrases. In addition, we develop a POS-guided phrasal segmentation model, which incorporates the shallow syntactic information in part-of-speech (POS) tags to further enhance the performance, when a POS tagger is available. Note that, AutoPhrase can support any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, the new method has shown significant improvements in effectiveness on five real-world datasets across different domains and languages.
Over the past few decades, non-monotonic reasoning has developed to be one of the most important topics in computational logic and artificial intelligence. Different ways to introduce non-monotonic aspects to classical logic have been considered, e.g., extension with default rules, extension with modal belief operators, or modification of the semantics. In this survey we consider a logical formalism from each of the above possibilities, namely Reiter's default logic, Moore's autoepistemic logic and McCarthy's circumscription. Additionally, we consider abduction, where one is not interested in inferences from a given knowledge base but in computing possible explanations for an observation with respect to a given knowledge base. Complexity results for different reasoning tasks for propositional variants of these logics have been studied already in the nineties. In recent years, however, a renewed interest in complexity issues can be observed. One current focal approach is to consider parameterized problems and identify reasonable parameters that allow for FPT algorithms. In another approach, the emphasis lies on identifying fragments, i.e., restriction of the logical language, that allow more efficient algorithms for the most important reasoning tasks. In this survey we focus on this second aspect. We describe complexity results for fragments of logical languages obtained by either restricting the allowed set of operators (e.g., forbidding negations one might consider only monotone formulae) or by considering only formulae in conjunctive normal form but with generalized clause types. The algorithmic problems we consider are suitable variants of satisfiability and implication in each of the logics, but also counting problems, where one is not only interested in the existence of certain objects (e.g., models of a formula) but asks for their number.
A key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities. Currently, only a small fraction of measurements can be assigned identities. Two complementary computational approaches have emerged to address the annotation problem: mapping candidate molecules to spectra, and mapping query spectra to molecular candidates. In essence, the candidate molecule with the spectrum that best explains the query spectrum is recommended as the target molecule. Despite candidate ranking being fundamental in both approaches, no prior works utilized rank learning tasks in determining the target molecule. We propose a novel machine learning model, Ensemble Spectral Prediction (ESP), for metabolite annotation. ESP takes advantage of prior neural network-based annotation models that utilize multilayer perceptron (MLP) networks and Graph Neural Networks (GNNs). Based on the ranking results of the MLP and GNN-based models, ESP learns a weighting for the outputs of MLP and GNN spectral predictors to generate a spectral prediction for a query molecule. Importantly, training data is stratified by molecular formula to provide candidate sets during model training. Further, baseline MLP and GNN models are enhanced by considering peak dependencies through multi-head attention mechanism and multi-tasking on spectral topic distributions. ESP improves average rank by 41% and 30% over the MLP and GNN baselines, respectively, demonstrating remarkable performance gain over state-of-the-art neural network approaches. We show that annotation performance, for ESP and other models, is a strong function of the number of molecules in the candidate set and their similarity to the target molecule.
Teachers often conduct surveys in order to collect data from a predefined group of students to gain insights into topics of interest. When analyzing surveys with open-ended textual responses, it is extremely time-consuming, labor-intensive, and difficult to manually process all the responses into an insightful and comprehensive report. In the analysis step, traditionally, the teacher has to read each of the responses and decide on how to group them in order to extract insightful information. Even though it is possible to group the responses only using certain keywords, such an approach would be limited since it not only fails to account for embedded contexts but also cannot detect polysemous words or phrases and semantics that are not expressible in single words. In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data. Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors. The encoded vectors then get clustered either into an optimally tuned number of groups or into a set of groups with pre-specified titles. In the former case, the clusters are then further analyzed to extract a representative set of keywords or summary sentences that serve as the labels of the clusters. In our framework, for the designated clusters, we finally provide context-aware wordclouds that demonstrate the semantically prominent keywords within each group. Honoring user privacy, we have successfully built the on-device implementation of our framework suitable for real-time analysis on mobile devices and have tested it on a synthetic dataset. Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.
Unsupervised graph-level representation learning plays a crucial role in a variety of tasks such as molecular property prediction and community analysis, especially when data annotation is expensive. Currently, most of the best-performing graph embedding methods are based on Infomax principle. The performance of these methods highly depends on the selection of negative samples and hurt the performance, if the samples were not carefully selected. Inter-graph similarity-based methods also suffer if the selected set of graphs for similarity matching is low in quality. To address this, we focus only on utilizing the current input graph for embedding learning. We are motivated by an observation from real-world graph generation processes where the graphs are formed based on one or more global factors which are common to all elements of the graph (e.g., topic of a discussion thread, solubility level of a molecule). We hypothesize extracting these common factors could be highly beneficial. Hence, this work proposes a new principle for unsupervised graph representation learning: Graph-wise Common latent Factor EXtraction (GCFX). We further propose a deep model for GCFX, deepGCFX, based on the idea of reversing the above-mentioned graph generation process which could explicitly extract common latent factors from an input graph and achieve improved results on downstream tasks to the current state-of-the-art. Through extensive experiments and analysis, we demonstrate that, while extracting common latent factors is beneficial for graph-level tasks to alleviate distractions caused by local variations of individual nodes or local neighbourhoods, it also benefits node-level tasks by enabling long-range node dependencies, especially for disassortative graphs.
Mobile edge computing (MEC) has been envisioned as a promising paradigm to handle the massive volume of data generated from ubiquitous mobile devices for enabling intelligent services with the help of artificial intelligence (AI). Traditionally, AI techniques often require centralized data collection and training in a single entity, e.g., an MEC server, which is now becoming a weak point due to data privacy concerns and high data communication overheads. In this context, federated learning (FL) has been proposed to provide collaborative data training solutions, by coordinating multiple mobile devices to train a shared AI model without exposing their data, which enjoys considerable privacy enhancement. To improve the security and scalability of FL implementation, blockchain as a ledger technology is attractive for realizing decentralized FL training without the need for any central server. Particularly, the integration of FL and blockchain leads to a new paradigm, called FLchain, which potentially transforms intelligent MEC networks into decentralized, secure, and privacy-enhancing systems. This article presents an overview of the fundamental concepts and explores the opportunities of FLchain in MEC networks. We identify several main topics in FLchain design, including communication cost, resource allocation, incentive mechanism, security and privacy protection. The key solutions for FLchain design are provided, and the lessons learned as well as the outlooks are also discussed. Then, we investigate the applications of FLchain in popular MEC domains, such as edge data sharing, edge content caching and edge crowdsensing. Finally, important research challenges and future directions are also highlighted.