A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their correspondence, according to the learned optimal transport. The node embeddings associated with the two graphs are learned under the guidance of the optimal transport, the distance of which not only reflects the topological structure of each graph but also yields the correspondence across the graphs. These two learning steps are mutually-beneficial, and are unified here by minimizing the Gromov-Wasserstein discrepancy with structural regularizers. This framework leads to an optimization problem that is solved by a proximal point method. We apply the proposed method to matching problems in real-world networks, and demonstrate its superior performance compared to alternative approaches.
PoPPy is a Point Process toolbox based on PyTorch, which achieves flexible designing and efficient learning of point process models. It can be used for interpretable sequential data modeling and analysis, e.g., Granger causality analysis of multi-variate point processes, point process-based simulation and prediction of event sequences. In practice, the key points of point process-based sequential data modeling include: 1) How to design intensity functions to describe the mechanism behind observed data? 2) How to learn the proposed intensity functions from observed data? The goal of PoPPy is providing a user-friendly solution to the key points above and achieving large-scale point process-based sequential data analysis, simulation and prediction.
We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transports to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled underlying distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving the algorithmic convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the topics of admissions, obtaining superior performance on clinically-meaningful disease network construction, mortality prediction as a function of admission codes, and procedure recommendation.
Health risks from cigarette smoking -- the leading cause of preventable death in the United States -- can be substantially reduced by quitting. Although most smokers are motivated to quit, the majority of quit attempts fail. A number of studies have explored the role of self-reported symptoms, physiologic measurements, and environmental context on smoking risk, but less work has focused on the temporal dynamics of smoking events, including daily patterns and related nicotine effects. In this work, we examine these dynamics and improve risk prediction by modeling smoking as a self-triggering process, in which previous smoking events modify current risk. Specifically, we fit smoking events self-reported by 42 smokers to a time-varying semi-parametric Hawkes process (TV-SPHP) developed for this purpose. Results show that the TV-SPHP achieves superior prediction performance compared to related and existing models, with the incorporation of time-varying predictors having greatest benefit over longer prediction windows. Moreover, the impact function illustrates previously unknown temporal dynamics of smoking, with possible connections to nicotine metabolism to be explored in future work through a randomized study design. By more effectively predicting smoking events and exploring a self-triggering component of smoking risk, this work supports development of novel or improved cessation interventions that aim to reduce death from smoking.
The success of semi-supervised manifold learning is highly dependent on the quality of the labeled samples. Active manifold learning aims to select and label representative landmarks on a manifold from a given set of samples to improve semi-supervised manifold learning. In this paper, we propose a novel active manifold learning method based on a unified framework of manifold landmarking. In particular, our method combines geometric manifold landmarking methods with algebraic ones. We achieve this by using the Gershgorin circle theorem to construct an upper bound on the learning error that depends on the landmarks and the manifold's alignment matrix in a way that captures both the geometric and algebraic criteria. We then attempt to select landmarks so as to minimize this bound by iteratively deleting the Gershgorin circles corresponding to the selected landmarks. We also analyze the complexity, scalability, and robustness of our method through simulations, and demonstrate its superiority compared to existing methods. Experiments in regression and classification further verify that our method performs better than its competitors.
How to effectively approximate real-valued parameters with binary codes plays a central role in neural network binarization. In this work, we reveal an important fact that binarizing different layers has a widely-varied effect on the compression ratio of network and the loss of performance. Based on this fact, we propose a novel and flexible neural network binarization method by introducing the concept of layer-wise priority which binarizes parameters in inverse order of their layer depth. In each training step, our method selects a specific network layer, minimizes the discrepancy between the original real-valued weights and its binary approximations, and fine-tunes the whole network accordingly. During the iteration of the above process, it is significant that we can flexibly decide whether to binarize the remaining floating layers or not and explore a trade-off between the loss of performance and the compression ratio of model. The resulting binary network is applied for efficient pedestrian detection. Extensive experimental results on several benchmarks show that under the same compression ratio, our method achieves much lower miss rate and faster detection speed than the state-of-the-art neural network binarization method.
We consider the learning of multi-agent Hawkes processes, a model containing multiple Hawkes processes with shared endogenous impact functions and different exogenous intensities. In the framework of stochastic maximum likelihood estimation, we explore the associated risk bound. Further, we consider the superposition of Hawkes processes within the model, and demonstrate that under certain conditions such an operation is beneficial for tightening the risk bound. Accordingly, we propose a stochastic optimization algorithm assisted with a diversity-driven superposition strategy, achieving better learning results with improved convergence properties. The effectiveness of the proposed method is verified on synthetic data, and its potential to solve the cold-start problem of sequential recommendation systems is demonstrated on real-world data.
The superposition of temporal point processes has been studied for many years, although the usefulness of such models for practical applications has not be fully developed. We investigate superposed Hawkes process as an important class of such models, with properties studied in the framework of least squares estimation. The superposition of Hawkes processes is demonstrated to be beneficial for tightening the upper bound of excess risk under certain conditions, and we show the feasibility of the benefit in typical situations. The usefulness of superposed Hawkes processes is verified on synthetic data, and its potential to solve the cold-start problem of recommendation systems is demonstrated on real-world data.
A parametric point process model is developed, with modeling based on the assumption that sequential observations often share latent phenomena, while also possessing idiosyncratic effects. An alternating optimization method is proposed to learn a "registered" point process that accounts for shared structure, as well as "warping" functions that characterize idiosyncratic aspects of each observed sequence. Under reasonable constraints, in each iteration we update the sample-specific warping functions by solving a set of constrained nonlinear programming problems in parallel, and update the model by maximum likelihood estimation. The justifiability, complexity and robustness of the proposed method are investigated in detail, and the influence of sequence stitching on the learning results is examined empirically. Experiments on both synthetic and real-world data demonstrate that the method yields explainable point process models, achieving encouraging results compared to state-of-the-art methods.
We propose an effective method to solve the event sequence clustering problems based on a novel Dirichlet mixture model of a special but significant type of point processes --- Hawkes process. In this model, each event sequence belonging to a cluster is generated via the same Hawkes process with specific parameters, and different clusters correspond to different Hawkes processes. The prior distribution of the Hawkes processes is controlled via a Dirichlet distribution. We learn the model via a maximum likelihood estimator (MLE) and propose an effective variational Bayesian inference algorithm. We specifically analyze the resulting EM-type algorithm in the context of inner-outer iterations and discuss several inner iteration allocation strategies. The identifiability of our model, the convergence of our learning method, and its sample complexity are analyzed in both theoretical and empirical ways, which demonstrate the superiority of our method to other competitors. The proposed method learns the number of clusters automatically and is robust to model misspecification. Experiments on both synthetic and real-world data show that our method can learn diverse triggering patterns hidden in asynchronous event sequences and achieve encouraging performance on clustering purity and consistency.