Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

A Dynamic Topic Identification and Labeling Approach of COVID-19 Tweets

Aug 13, 2021
Khandaker Tayef Shahriar, Iqbal H. Sarker, Muhammad Nazrul Islam, Mohammad Ali Moni

This paper formulates the problem of dynamically identifying key topics with proper labels from COVID-19 Tweets to provide an overview of wider public opinion. Nowadays, social media is one of the best ways to connect people through Internet technology, which is also considered an essential part of our daily lives. In late December 2019, an outbreak of the novel coronavirus, COVID-19 was reported, and the World Health Organization declared an emergency due to its rapid spread all over the world. The COVID-19 epidemic has affected the use of social media by many people across the globe. Twitter is one of the most influential social media services, which has seen a dramatic increase in its use from the epidemic. Thus dynamic extraction of specific topics with labels from tweets of COVID-19 is a challenging issue for highlighting conversation instead of manual topic labeling approach. In this paper, we propose a framework that automatically identifies the key topics with labels from the tweets using the top Unigram feature of aspect terms cluster from Latent Dirichlet Allocation (LDA) generated topics. Our experiment result shows that this dynamic topic identification and labeling approach is effective having the accuracy of 85.48\% with respect to the manual static approach.

  Access Paper or Ask Questions

Low-Resource Contextual Topic Identification on Speech

Sep 28, 2018
Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-language topic classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large improvements. In particular, we propose an attention-based contextual model which is able to leverage the contexts in a selective manner. We test both our contextual and non-contextual models on four LORELEI languages, and on all but one our attention-based contextual model significantly outperforms the context-independent models.

* Accepted for publication at 2018 IEEE Workshop on Spoken Language Technology (SLT) 

  Access Paper or Ask Questions

Deep topic modeling by multilayer bootstrap network and lasso

Oct 24, 2019
Jianyu Wang, Xiao-Lei Zhang

Topic modeling is widely studied for the dimension reduction and analysis of documents. However, it is formulated as a difficult optimization problem. Current approximate solutions also suffer from inaccurate model- or data-assumptions. To deal with the above problems, we propose a polynomial-time deep topic model with no model and data assumptions. Specifically, we first apply multilayer bootstrap network (MBN), which is an unsupervised deep model, to reduce the dimension of documents, and then use the low-dimensional data representations or their clustering results as the target of supervised Lasso for topic word discovery. To our knowledge, this is the first time that MBN and Lasso are applied to unsupervised topic modeling. Experimental comparison results with five representative topic models on the 20-newsgroups and TDT2 corpora illustrate the effectiveness of the proposed algorithm.

  Access Paper or Ask Questions

Would you Like to Talk about Sports Now? Towards Contextual Topic Suggestion for Open-Domain Conversational Agents

May 28, 2020
Ali Ahmadvand, Harshita Sahijwani, Eugene Agichtein

To hold a true conversation, an intelligent agent should be able to occasionally take initiative and recommend the next natural conversation topic. This is a challenging task. A topic suggested by the agent should be relevant to the person, appropriate for the conversation context, and the agent should have something interesting to say about it. Thus, a scripted, or one-size-fits-all, popularity-based topic suggestion is doomed to fail. Instead, we explore different methods for a personalized, contextual topic suggestion for open-domain conversations. We formalize the Conversational Topic Suggestion problem (CTS) to more clearly identify the assumptions and requirements. We also explore three possible approaches to solve this problem: (1) model-based sequential topic suggestion to capture the conversation context (CTS-Seq), (2) Collaborative Filtering-based suggestion to capture previous successful conversations from similar users (CTS-CF), and (3) a hybrid approach combining both conversation context and collaborative filtering. To evaluate the effectiveness of these methods, we use real conversations collected as part of the Amazon Alexa Prize 2018 Conversational AI challenge. The results are promising: the CTS-Seq model suggests topics with 23% higher accuracy than the baseline, and incorporating collaborative filtering signals into a hybrid CTS-Seq-CF model further improves recommendation accuracy by 12%. Together, our proposed models, experiments, and analysis significantly advance the study of open-domain conversational agents, and suggest promising directions for future improvements.

* CHIIR 2020 

  Access Paper or Ask Questions

Topical Keyphrase Extraction with Hierarchical Semantic Networks

Oct 17, 2019
Yoo yeon Sung, Seoung Bum Kim

Topical keyphrase extraction is used to summarize large collections of text documents. However, traditional methods cannot properly reflect the intrinsic semantics and relationships of keyphrases because they rely on a simple term-frequency-based process. Consequently, these methods are not effective in obtaining significant contextual knowledge. To resolve this, we propose a topical keyphrase extraction method based on a hierarchical semantic network and multiple centrality network measures that together reflect the hierarchical semantics of keyphrases. We conduct experiments on real data to examine the practicality of the proposed method and to compare its performance with that of existing topical keyphrase extraction methods. The results confirm that the proposed method outperforms state-of-the-art topical keyphrase extraction methods in terms of the representativeness of the selected keyphrases for each topic. The proposed method can effectively reflect intrinsic keyphrase semantics and interrelationships.

* Accepted at Decision Support Systems 2019 on 16 Sep 2019 

  Access Paper or Ask Questions

Learning Topics using Semantic Locality

Apr 11, 2018
Ziyi Zhao, Krittaphat Pugdeethosapol, Sheng Lin, Zhe Li, Caiwen Ding, Yanzhi Wang, Qinru Qiu

The topic modeling discovers the latent topic probability of the given text documents. To generate the more meaningful topic that better represents the given document, we proposed a new feature extraction technique which can be used in the data preprocessing stage. The method consists of three steps. First, it generates the word/word-pair from every single document. Second, it applies a two-way TF-IDF algorithm to word/word-pair for semantic filtering. Third, it uses the K-means algorithm to merge the word pairs that have the similar semantic meaning. Experiments are carried out on the Open Movie Database (OMDb), Reuters Dataset and 20NewsGroup Dataset. The mean Average Precision score is used as the evaluation metric. Comparing our results with other state-of-the-art topic models, such as Latent Dirichlet allocation and traditional Restricted Boltzmann Machines. Our proposed data preprocessing can improve the generated topic accuracy by up to 12.99\%.

* International Conference of Pattern Recognition (ICPR) in 2018 

  Access Paper or Ask Questions

Graph Convolutional Topic Model for Data Streams

Mar 17, 2020
Ngo Van Linh, Tran Xuan Bach, Khoat Than

Learning hidden topics in data streams has been paid a great deal of attention by researchers with a lot of proposed methods, but exploiting prior knowledge in general and a knowledge graph in particular has not been taken into adequate consideration in these methods. Prior knowledge that is derived from human knowledge (e.g. Wordnet) or a pre-trained model (e.g.Word2vec) is very valuable and useful to help topic models work better, especially on short texts. However, previous work often ignores this resource, or it can only utilize prior knowledge of a vector form in a simple way. In this paper, we propose a novel graph convolutional topic model (GCTM) which integrates graph convolutional networks (GCN) into a topic model and a learning method which learns the networks and the topic model simultaneously for data streams. In each minibatch, our method not only can exploit an external knowledge graph but also can balance between the external and old knowledge to perform well on new data. We conduct extensive experiments to evaluate our method with both human graph knowledge(Wordnet) and a graph built from pre-trained word embeddings (Word2vec). The experimental results show that our method achieves significantly better performances than the state-of-the-art baselines in terms of probabilistic predictive measure and topic coherence. In particular, our method can work well when dealing with short texts as well as concept drift. The implementation of GCTM is available at

  Access Paper or Ask Questions

Learning with fuzzy hypergraphs: a topical approach to query-oriented text summarization

Jun 22, 2019
Hadrien Van Lierde, Tommy W. S. Chow

Existing graph-based methods for extractive document summarization represent sentences of a corpus as the nodes of a graph or a hypergraph in which edges depict relationships of lexical similarity between sentences. Such approaches fail to capture semantic similarities between sentences when they express a similar information but have few words in common and are thus lexically dissimilar. To overcome this issue, we propose to extract semantic similarities based on topical representations of sentences. Inspired by the Hierarchical Dirichlet Process, we propose a probabilistic topic model in order to infer topic distributions of sentences. As each topic defines a semantic connection among a group of sentences with a certain degree of membership for each sentence, we propose a fuzzy hypergraph model in which nodes are sentences and fuzzy hyperedges are topics. To produce an informative summary, we extract a set of sentences from the corpus by simultaneously maximizing their relevance to a user-defined query, their centrality in the fuzzy hypergraph and their coverage of topics present in the corpus. We formulate a polynomial time algorithm building on the theory of submodular functions to solve the associated optimization problem. A thorough comparative analysis with other graph-based summarization systems is included in the paper. Our obtained results show the superiority of our method in terms of content coverage of the summaries.

* Information Sciences, 496 (2019), 212-224 
* 8 figures 

  Access Paper or Ask Questions

Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration

Mar 07, 2010
Wongkot Sriurai, Phayung Meesad, Choochart Haruechaiyasak

Most Web page classification models typically apply the bag of words (BOW) model to represent the feature space. The original BOW representation, however, is unable to recognize semantic relationships between terms. One possible solution is to apply the topic model approach based on the Latent Dirichlet Allocation algorithm to cluster the term features into a set of latent topics. Terms assigned into the same topic are semantically related. In this paper, we propose a novel hierarchical classification method based on a topic model and by integrating additional term features from neighboring pages. Our hierarchical classification method consists of two phases: (1) feature representation by using a topic model and integrating neighboring pages, and (2) hierarchical Support Vector Machines (SVM) classification model constructed from a confusion matrix. From the experimental results, the approach of using the proposed hierarchical SVM model by integrating current page with neighboring pages via the topic model yielded the best performance with the accuracy equal to 90.33% and the F1 measure of 90.14%; an improvement of 5.12% and 5.13% over the original SVM model, respectively.

* Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947 5500, 

  Access Paper or Ask Questions