Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Modeling Fuzzy Cluster Transitions for Topic Tracing

Apr 16, 2021
Xiaonan Jing, Yi Zhang, Qingyuan Hu, Julia Taylor Rayz

Twitter can be viewed as a data source for Natural Language Processing (NLP) tasks. The continuously updating data streams on Twitter make it challenging to trace real-time topic evolution. In this paper, we propose a framework for modeling fuzzy transitions of topic clusters. We extend our previous work on crisp cluster transitions by incorporating fuzzy logic in order to enrich the underlying structures identified by the framework. We apply the methodology to both computer generated clusters of nouns from tweets and human tweet annotations. The obtained fuzzy transitions are compared with the crisp transitions, on both computer generated clusters and human labeled topic sets.

* Accepted as full paper by NAFIPS'2021 

  Access Paper or Ask Questions

Topic Propagation in Conversational Search

Apr 29, 2020
I. Mele, C. I. Muntean, F. M. Nardini, R. Perego, N. Tonellotto, O. Frieder

In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detecting possible topic shifts and semantic relationships among utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages. We present a comprehensive experimental evaluation of the architecture assessed in terms of traditional IR metrics at small cutoffs. Experimental results show the effectiveness of our techniques that achieve an improvement up to 0.28 (+93%) for [email protected] and 0.19 (+89.9%) for [email protected] w.r.t. the CAsT baseline.

* 5 pages 

  Access Paper or Ask Questions

Geometric Dirichlet Means algorithm for topic inference

Oct 27, 2016
Mikhail Yurochkin, XuanLong Nguyen

We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA's likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data.

  Access Paper or Ask Questions

How Pandemic Spread in News: Text Analysis Using Topic Model

Feb 19, 2021
Minghao Wang, Paolo Mengoni

Researches about COVID-19 has increased largely, no matter in the biology field or the others. This research conducted a text analysis using LDA topic model. We firstly scraped totally 1127 articles and 5563 comments on SCMP covering COVID-19 from Jan 20 to May 19, then we trained the LDA model and tuned parameters based on the Cv coherence as the model evaluation method. With the optimal model, dominant topics, representative documents of each topic and the inconsistence between articles and comments are analyzed. 3 possible improvements are discussed at last.

  Access Paper or Ask Questions

Supervised topic models for clinical interpretability

Dec 06, 2016
Michael C. Hughes, Huseyin Melih Elibol, Thomas McCoy, Roy Perlis, Finale Doshi-Velez

Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often minimal, leading to poor predictions on heldout data. We investigate penalized optimization methods for training sLDA that produce interpretable topic-word parameters and useful heldout predictions, using recognition networks to speed-up inference. We report preliminary results on synthetic data and on predicting successful anti-depressant medication given a patient's diagnostic history.

* Accepted poster presentation at NIPS 2016 Workshop on Machine Learning for Health (

  Access Paper or Ask Questions

Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference

Nov 18, 2017
Moontae Lee, David Mimno

The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space. However, the existing greedy algorithm often selects poor anchor words, reducing topic quality and interpretability. Rather than finding an approximate convex hull in a high-dimensional space, we propose to find an exact convex hull in a visualizable 2- or 3-dimensional space. Such low-dimensional embeddings both improve topics and clearly show users why the algorithm selects certain words.

  Access Paper or Ask Questions

Tracing Topic Transitions with Temporal Graph Clusters

Apr 16, 2021
Xiaonan Jing, Qingyuan Hu, Yi Zhang, Julia Taylor Rayz

Twitter serves as a data source for many Natural Language Processing (NLP) tasks. It can be challenging to identify topics on Twitter due to continuous updating data stream. In this paper, we present an unsupervised graph based framework to identify the evolution of sub-topics within two weeks of real-world Twitter data. We first employ a Markov Clustering Algorithm (MCL) with a node removal method to identify optimal graph clusters from temporal Graph-of-Words (GoW). Subsequently, we model the clustering transitions between the temporal graphs to identify the topic evolution. Finally, the transition flows generated from both computational approach and human annotations are compared to ensure the validity of our framework.

* Accepted as full paper by the 34th International FLAIRS Conference 

  Access Paper or Ask Questions

Topic Model Supervised by Understanding Map

Oct 21, 2021
Gangli Liu

Inspired by the notion of Center of Mass in physics, an extension called Semantic Center of Mass (SCOM) is proposed, and used to discover the abstract "topic" of a document. The notion is under a framework model called Understanding Map Supervised Topic Model (UM-S-TM). The devise aim of UM-S-TM is to let both the document content and a semantic network -- specifically, Understanding Map -- play a role, in interpreting the meaning of a document. Based on different justifications, three possible methods are devised to discover the SCOM of a document. Some experiments on artificial documents and Understanding Maps are conducted to test their outcomes. In addition, its ability of vectorization of documents and capturing sequential information are tested. We also compared UM-S-TM with probabilistic topic models like Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Analysis (pLSA).

  Access Paper or Ask Questions