Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Topic Modeling": models, code, and papers

Coherent Dialogue with Attention-based Language Models

Nov 21, 2016
Hongyuan Mei, Mohit Bansal, Matthew R. Walter

We model coherent conversation continuation via RNN-based dialogue models equipped with a dynamic attention mechanism. Our attention-RNN language model dynamically increases the scope of attention on the history as the conversation continues, as opposed to standard attention (or alignment) models with a fixed input scope in a sequence-to-sequence model. This allows each generated word to be associated with the most relevant words in its corresponding conversation history. We evaluate the model on two popular dialogue datasets, the open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot dataset, and achieve significant improvements over the state-of-the-art and baselines on several metrics, including complementary diversity-based metrics, human evaluation, and qualitative visualizations. We also show that a vanilla RNN with dynamic attention outperforms more complex memory models (e.g., LSTM and GRU) by allowing for flexible, long-distance memory. We promote further coherence via topic modeling-based reranking.

* To appear at AAAI 2017 

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

Dec 01, 2021
Valentin Stanev, Erik Skau, Ichiro Takeuchi, Boian S. Alexandrov

We utilize a recently developed topic modeling method called SeNMFk, extending the standard Non-negative Matrix Factorization (NMF) methods by incorporating the semantic structure of the text, and adding a robust system for determining the number of topics. With SeNMFk, we were able to extract coherent topics validated by human experts. From these topics, a few are relatively general and cover broad concepts, while the majority can be precisely mapped to specific scientific effects or measurement techniques. The topics also differ by ubiquity, with only three topics prevalent in almost 40 percent of the abstract, while each specific topic tends to dominate a small subset of the abstracts. These results demonstrate the ability of SeNMFk to produce a layered and nuanced analysis of large scientific corpora.


Improving Entity Linking by Modeling Latent Entity Type Information

Jan 06, 2020
Shuang Chen, Jinpeng Wang, Feng Jiang, Chin-Yew Lin

Existing state of the art neural entity linking models employ attention-based bag-of-words context model and pre-trained entity embeddings bootstrapped from word embeddings to assess topic level context compatibility. However, the latent entity type information in the immediate context of the mention is neglected, which causes the models often link mentions to incorrect entities with incorrect type. To tackle this problem, we propose to inject latent entity type information into the entity embeddings based on pre-trained BERT. In addition, we integrate a BERT-based entity similarity score into the local context model of a state-of-the-art model to better capture latent entity type information. Our model significantly outperforms the state-of-the-art entity linking models on standard benchmark (AIDA-CoNLL). Detailed experiment analysis demonstrates that our model corrects most of the type errors produced by the direct baseline.

* Accepted by AAAI 2020 

Measuring LDA Topic Stability from Clusters of Replicated Runs

Aug 24, 2018
Mika Mäntylä, Maëlick Claes, Umar Farooq

Background: Unstructured and textual data is increasing rapidly and Latent Dirichlet Allocation (LDA) topic modeling is a popular data analysis methods for it. Past work suggests that instability of LDA topics may lead to systematic errors. Aim: We propose a method that relies on replicated LDA runs, clustering, and providing a stability metric for the topics. Method: We generate k LDA topics and replicate this process n times resulting in n*k topics. Then we use K-medioids to cluster the n*k topics to k clusters. The k clusters now represent the original LDA topics and we present them like normal LDA topics showing the ten most probable words. For the clusters, we try multiple stability metrics, out of which we recommend Rank-Biased Overlap, showing the stability of the topics inside the clusters. Results: We provide an initial validation where our method is used for 270,000 Mozilla Firefox commit messages with k=20 and n=20. We show how our topic stability metrics are related to the contents of the topics. Conclusions: Advances in text mining enable us to analyze large masses of text in software engineering but non-deterministic algorithms, such as LDA, may lead to unreplicable conclusions. Our approach makes LDA stability transparent and is also complementary rather than alternative to many prior works that focus on LDA parameter tuning.

* ESEM '18 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)}{October 11--12, 2018}{Oulu, Finland} 

Extracting Major Topics of COVID-19 Related Tweets

Oct 05, 2021
Faezeh Azizi, Hamed Vahdat-Nejad, Hamideh Hajiabadi, Mohammad Hossein Khosravi

With the outbreak of the Covid-19 virus, the activity of users on Twitter has significantly increased. Some studies have investigated the hot topics of tweets in this period; however, little attention has been paid to presenting and analyzing the spatial and temporal trends of Covid-19 topics. In this study, we use the topic modeling method to extract global topics during the nationwide quarantine periods (March 23 to June 23, 2020) on Covid-19 tweets. We implement the Latent Dirichlet Allocation (LDA) algorithm to extract the topics and then name them with the "reopening", "death cases", "telecommuting", "protests", "anger expression", "masking", "medication", "social distance", "second wave", and "peak of the disease" titles. We additionally analyze temporal trends of the topics for the whole world and four countries. By analyzing the graphs, fascinating results are obtained from altering users' focus on topics over time.


Evaluating Generalization in Classical and Quantum Generative Models

Jan 21, 2022
Kaitlin Gili, Marta Mauri, Alejandro Perdomo-Ortiz

Defining and accurately measuring generalization in generative models remains an ongoing challenge and a topic of active research within the machine learning community. This is in contrast to discriminative models, where there is a clear definition of generalization, i.e., the model's classification accuracy when faced with unseen data. In this work, we construct a simple and unambiguous approach to evaluate the generalization capabilities of generative models. Using the sample-based generalization metrics proposed here, any generative model, from state-of-the-art classical generative models such as GANs to quantum models such as Quantum Circuit Born Machines, can be evaluated on the same ground on a concrete well-defined framework. In contrast to other sample-based metrics for probing generalization, we leverage constrained optimization problems (e.g., cardinality constrained problems) and use these discrete datasets to define specific metrics capable of unambiguously measuring the quality of the samples and the model's generalization capabilities for generating data beyond the training set but still within the valid solution space. Additionally, our metrics can diagnose trainability issues such as mode collapse and overfitting, as we illustrate when comparing GANs to quantum-inspired models built out of tensor networks. Our simulation results show that our quantum-inspired models have up to a $68 \times$ enhancement in generating unseen unique and valid samples compared to GANs, and a ratio of 61:2 for generating samples with better quality than those observed in the training set. We foresee these metrics as valuable tools for rigorously defining practical quantum advantage in the domain of generative modeling.

* 24 pages, 14 figures 

ADMM-based Networked Stochastic Variational Inference

Feb 27, 2018
Hamza Anwar, Quanyan Zhu

Owing to the recent advances in "Big Data" modeling and prediction tasks, variational Bayesian estimation has gained popularity due to their ability to provide exact solutions to approximate posteriors. One key technique for approximate inference is stochastic variational inference (SVI). SVI poses variational inference as a stochastic optimization problem and solves it iteratively using noisy gradient estimates. It aims to handle massive data for predictive and classification tasks by applying complex Bayesian models that have observed as well as latent variables. This paper aims to decentralize it allowing parallel computation, secure learning and robustness benefits. We use Alternating Direction Method of Multipliers in a top-down setting to develop a distributed SVI algorithm such that independent learners running inference algorithms only require sharing the estimated model parameters instead of their private datasets. Our work extends the distributed SVI-ADMM algorithm that we first propose, to an ADMM-based networked SVI algorithm in which not only are the learners working distributively but they share information according to rules of a graph by which they form a network. This kind of work lies under the umbrella of `deep learning over networks' and we verify our algorithm for a topic-modeling problem for corpus of Wikipedia articles. We illustrate the results on latent Dirichlet allocation (LDA) topic model in large document classification, compare performance with the centralized algorithm, and use numerical experiments to corroborate the analytical results.

* to be submitted for publishing 

Structured Black Box Variational Inference for Latent Time Series Models

Jul 04, 2017
Robert Bamler, Stephan Mandt

Continuous latent time series models are prevalent in Bayesian modeling; examples include the Kalman filter, dynamic collaborative filtering, or dynamic topic models. These models often benefit from structured, non mean field variational approximations that capture correlations between time steps. Black box variational inference with reparameterization gradients (BBVI) allows us to explore a rich new class of Bayesian non-conjugate latent time series models; however, a naive application of BBVI to such structured variational models would scale quadratically in the number of time steps. We describe a BBVI algorithm analogous to the forward-backward algorithm which instead scales linearly in time. It allows us to efficiently sample from the variational distribution and estimate the gradients of the ELBO. Finally, we show results on the recently proposed dynamic word embedding model, which was trained using our method.

* 5 pages, 1 figure; presented at the ICML 2017 Time Series Workshop 

Deep Probabilistic Graphical Modeling

Apr 25, 2021
Adji B. Dieng

Probabilistic graphical modeling (PGM) provides a framework for formulating an interpretable generative process of data and expressing uncertainty about unknowns, but it lacks flexibility. Deep learning (DL) is an alternative framework for learning from data that has achieved great empirical success in recent years. DL offers great flexibility, but it lacks the interpretability and calibration of PGM. This thesis develops deep probabilistic graphical modeling (DPGM.) DPGM consists in leveraging DL to make PGM more flexible. DPGM brings about new methods for learning from data that exhibit the advantages of both PGM and DL. We use DL within PGM to build flexible models endowed with an interpretable latent structure. One model class we develop extends exponential family PCA using neural networks to improve predictive performance while enforcing the interpretability of the latent factors. Another model class we introduce enables accounting for long-term dependencies when modeling sequential data, which is a challenge when using purely DL or PGM approaches. Finally, DPGM successfully solves several outstanding problems of probabilistic topic models, a widely used family of models in PGM. DPGM also brings about new algorithms for learning with complex data. We develop reweighted expectation maximization, an algorithm that unifies several existing maximum likelihood-based algorithms for learning models parameterized by neural networks. This unifying view is made possible using expectation maximization, a canonical inference algorithm in PGM. We also develop entropy-regularized adversarial learning, a learning paradigm that deviates from the traditional maximum likelihood approach used in PGM. From the DL perspective, entropy-regularized adversarial learning provides a solution to the long-standing mode collapse problem of generative adversarial networks, a widely used DL approach.

* This thesis was defended in April 2020 and accepted without revision. The author received her PhD in Statistics from Columbia University on May 20, 2020 

Modeling Magnetic Particle Imaging for Dynamic Tracer Distributions

Jun 24, 2021
Christina Brandt, Christiane Schmidt

Magnetic Particle Imaging (MPI) is a promising tracer-based, functional medical imaging technique which measures the non-linear response of superparamagnetic iron oxide nanoparticles (SPION) to a dynamic magnetic field. For image reconstruction, system matrices from time-consuming calibration scans are used predominantly. Finding modeled forward operators for magnetic particle imaging, which are able to compete with measured matrices in practice, is an ongoing topic of research. The existing models for magnetic particle imaging are by design not suitable for dynamic tracer concentrations. Neither modeled nor measured system matrices account for changes in the concentration during a single scanning cycle. In this paper we present a new MPI forward model for dynamic concentrations. A standard model will be introduced briefly, followed by the changes due to the dynamic behavior of the tracer concentration. Furthermore, the relevance of this new extended model is examined by investigating the influence of the extension and example reconstructions with the new and the standard model.

* 20 pages, 9 figures, submitted for possible publication