We consider how to effectively use prior knowledge when learning a Bayesian model from streaming environments where the data come infinitely and sequentially. This problem is highly important in the era of data explosion and rich sources of precious external knowledge such as pre-trained models, ontologies, Wikipedia, etc. We show that some existing approaches can forget any knowledge very fast. We then propose a novel framework that enables to incorporate the prior knowledge of different forms into a base Bayesian model for data streams. Our framework subsumes some existing popular models for time-series/dynamic data. Extensive experiments show that our framework outperforms existing methods with a large margin. In particular, our framework can help Bayesian models generalize well on extremely short text while other methods overfit. The implementation of our framework is available at https://github.com/bachtranxuan/TPS.git.
Learning hidden topics in data streams has been paid a great deal of attention by researchers with a lot of proposed methods, but exploiting prior knowledge in general and a knowledge graph in particular has not been taken into adequate consideration in these methods. Prior knowledge that is derived from human knowledge (e.g. Wordnet) or a pre-trained model (e.g.Word2vec) is very valuable and useful to help topic models work better, especially on short texts. However, previous work often ignores this resource, or it can only utilize prior knowledge of a vector form in a simple way. In this paper, we propose a novel graph convolutional topic model (GCTM) which integrates graph convolutional networks (GCN) into a topic model and a learning method which learns the networks and the topic model simultaneously for data streams. In each minibatch, our method not only can exploit an external knowledge graph but also can balance between the external and old knowledge to perform well on new data. We conduct extensive experiments to evaluate our method with both human graph knowledge(Wordnet) and a graph built from pre-trained word embeddings (Word2vec). The experimental results show that our method achieves significantly better performances than the state-of-the-art baselines in terms of probabilistic predictive measure and topic coherence. In particular, our method can work well when dealing with short texts as well as concept drift. The implementation of GCTM is available at https://github.com/bachtranxuan/GCTM.git.
High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks. The Learning Controllable Embedding (LCE) framework addresses these challenges by embedding the observations into a lower dimensional latent space, estimating the latent dynamics, and then performing control directly in the latent space. To ensure the learned latent dynamics are predictive of next-observations, all existing LCE approaches decode back into the observation space and explicitly perform next-observation prediction---a challenging high-dimensional task that furthermore introduces a large number of nuisance parameters (i.e., the decoder) which are discarded during control. In this paper, we propose a novel information-theoretic LCE approach and show theoretically that explicit next-observation prediction can be replaced with predictive coding. We then use predictive coding to develop a decoder-free LCE model whose latent dynamics are amenable to locally-linear control. Extensive experiments on benchmark tasks show that our model reliably learns a controllable latent space that leads to superior performance when compared with state-of-the-art LCE baselines.
One of the core problems in statistical models is the estimation of a posterior distribution. For topic models, the problem of posterior inference for individual texts is particularly important, especially when dealing with data streams, but is often intractable in the worst case. As a consequence, existing methods for posterior inference are approximate and do not have any guarantee on neither quality nor convergence rate. In this paper, we introduce a provably fast algorithm, namely Online Maximum a Posteriori Estimation (OPE), for posterior inference in topic models. OPE has more attractive properties than existing inference approaches, including theoretical guarantees on quality and fast rate of convergence to a local maximal/stationary point of the inference problem. The discussions about OPE are very general and hence can be easily employed in a wide range of contexts. Finally, we employ OPE to design three methods for learning Latent Dirichlet Allocation from text streams or large corpora. Extensive experiments demonstrate some superior behaviors of OPE and of our new learning methods.
Topic models are popular for modeling discrete data (e.g., texts, images, videos, links), and provide an efficient way to discover hidden structures/semantics in massive data. One of the core problems in this field is the posterior inference for individual data instances. This problem is particularly important in streaming environments, but is often intractable. In this paper, we investigate the use of the Frank-Wolfe algorithm (FW) for recovering sparse solutions to posterior inference. From detailed elucidation of both theoretical and practical aspects, FW exhibits many interesting properties which are beneficial to topic modeling. We then employ FW to design fast methods, including ML-FW, for learning latent Dirichlet allocation (LDA) at large scales. Extensive experiments show that to reach the same predictiveness level, ML-FW can perform tens to thousand times faster than existing state-of-the-art methods for learning LDA from massive/streaming data.
Non-convex optimization problems often arise from probabilistic modeling, such as estimation of posterior distributions. Non-convexity makes the problems intractable, and poses various obstacles for us to design efficient algorithms. In this work, we attack non-convexity by first introducing the concept of \emph{probable convexity} for analyzing convexity of real functions in practice. We then use the new concept to analyze an inference problem in the \emph{Correlated Topic Model} (CTM) and related nonconjugate models. Contrary to the existing belief of intractability, we show that this inference problem is concave under certain conditions. One consequence of our analyses is a novel algorithm for learning CTM which is significantly more scalable and qualitative than existing methods. Finally, we highlight that stochastic gradient algorithms might be a practical choice to resolve efficiently non-convex problems. This finding might find beneficial in many contexts which are beyond probabilistic modeling.
Inference is an integral part of probabilistic topic models, but is often non-trivial to derive an efficient algorithm for a specific model. It is even much more challenging when we want to find a fast inference algorithm which always yields sparse latent representations of documents. In this article, we introduce a simple framework for inference in probabilistic topic models, denoted by FW. This framework is general and flexible enough to be easily adapted to mixture models. It has a linear convergence rate, offers an easy way to incorporate prior knowledge, and provides us an easy way to directly trade off sparsity against quality and time. We demonstrate the goodness and flexibility of FW over existing inference methods by a number of tasks. Finally, we show how inference in topic models with nonconjugate priors can be done efficiently.