Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

ET-LDA: Joint Topic Modeling For Aligning, Analyzing and Sensemaking of Public Events and Their Twitter Feeds

Dec 21, 2012
Yuheng Hu, Ajita John, Fei Wang, Doree Duncan Seligmann, Subbarao Kambhampati

Social media channels such as Twitter have emerged as popular platforms for crowds to respond to public events such as speeches, sports and debates. While this promises tremendous opportunities to understand and make sense of the reception of an event from the social media, the promises come entwined with significant technical challenges. In particular, given an event and an associated large scale collection of tweets, we need approaches to effectively align tweets and the parts of the event they refer to. This in turn raises questions about how to segment the event into smaller yet meaningful parts, and how to figure out whether a tweet is a general one about the entire event or specific one aimed at a particular segment of the event. In this work, we present ET-LDA, an effective method for aligning an event and its tweets through joint statistical modeling of topical influences from the events and their associated tweets. The model enables the automatic segmentation of the events and the characterization of tweets into two categories: (1) episodic tweets that respond specifically to the content in the segments of the events, and (2) steady tweets that respond generally about the events. We present an efficient inference method for this model, and a comprehensive evaluation of its effectiveness over existing methods. In particular, through a user study, we demonstrate that users find the topics, the segments, the alignment, and the episodic tweets discovered by ET-LDA to be of higher quality and more interesting as compared to the state-of-the-art, with improvements in the range of 18-41%.

* errors in reference, delete for now 

  Access Paper or Ask Questions

Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Sep 21, 2016
Kar Wai Lim, Wray Buntine

Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are laden with opinions, their "dirty" nature (as natural language) has discouraged researchers from applying LDA-based opinion model for product review mining. Tweets are often informal, unstructured and lacking labeled data such as categories and ratings, making it challenging for product opinion mining. In this paper, we propose an LDA-based opinion model named Twitter Opinion Topic Model (TOTM) for opinion mining and sentiment analysis. TOTM leverages hashtags, mentions, emoticons and strong sentiment words that are present in tweets in its discovery process. It improves opinion prediction by modeling the target-opinion interaction directly, thus discovering target specific opinion words, neglected in existing approaches. Moreover, we propose a new formulation of incorporating sentiment prior information into a topic model, by utilizing an existing public sentiment lexicon. This is novel in that it learns and updates with the data. We conduct experiments on 9 million tweets on electronic products, and demonstrate the improved performance of TOTM in both quantitative evaluations and qualitative analysis. We show that aspect-based opinion analysis on massive volume of tweets provides useful opinions on products.

* Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM), pp. 1319-1328. ACM. 2014 
* CIKM paper 

  Access Paper or Ask Questions

Technical Progress Analysis Using a Dynamic Topic Model for Technical Terms to Revise Patent Classification Codes

Dec 18, 2020
Mana Iwata, Yoshiro Matsuda, Yoshimasa Utsumi, Yoshitoshi Tanaka, Kazuhide Nakata

Japanese patents are assigned a patent classification code, FI (File Index), that is unique to Japan. FI is a subdivision of the IPC, an international patent classification code, that is related to Japanese technology. FIs are revised to keep up with technological developments. These revisions have already established more than 30,000 new FIs since 2006. However, these revisions require a lot of time and workload. Moreover, these revisions are not automated and are thus inefficient. Therefore, using machine learning to assist in the revision of patent classification codes (FI) will lead to improved accuracy and efficiency. This study analyzes patent documents from this new perspective of assisting in the revision of patent classification codes with machine learning. To analyze time-series changes in patents, we used the dynamic topic model (DTM), which is an extension of the latent Dirichlet allocation (LDA). Also, unlike English, the Japanese language requires morphological analysis. Patents contain many technical words that are not used in everyday life, so morphological analysis using a common dictionary is not sufficient. Therefore, we used a technique for extracting technical terms from text. After extracting technical terms, we applied them to DTM. In this study, we determined the technological progress of the lighting class F21 for 14 years and compared it with the actual revision of patent classification codes. In other words, we extracted technical terms from Japanese patents and applied DTM to determine the progress of Japanese technology. Then, we analyzed the results from the new perspective of revising patent classification codes with machine learning. As a result, it was found that those whose topics were on the rise were judged to be new technologies.

  Access Paper or Ask Questions

Bayesian Allocation Model: Inference by Sequential Monte Carlo for Nonnegative Tensor Factorizations and Topic Models using Polya Urns

Mar 11, 2019
Ali Taylan Cemgil, Mehmet Burak Kurutmaz, Sinan Yildirim, Melih Barsbey, Umut Simsekli

We introduce a dynamic generative model, Bayesian allocation model (BAM), which establishes explicit connections between nonnegative tensor factorization (NTF), graphical models of discrete probability distributions and their Bayesian extensions, and the topic models such as the latent Dirichlet allocation. BAM is based on a Poisson process, whose events are marked by using a Bayesian network, where the conditional probability tables of this network are then integrated out analytically. We show that the resulting marginal process turns out to be a Polya urn, an integer valued self-reinforcing process. This urn processes, which we name a Polya-Bayes process, obey certain conditional independence properties that provide further insight about the nature of NTF. These insights also let us develop space efficient simulation algorithms that respect the potential sparsity of data: we propose a class of sequential importance sampling algorithms for computing NTF and approximating their marginal likelihood, which would be useful for model selection. The resulting methods can also be viewed as a model scoring method for topic models and discrete Bayesian networks with hidden variables. The new algorithms have favourable properties in the sparse data regime when contrasted with variational algorithms that become more accurate when the total sum of the elements of the observed tensor goes to infinity. We illustrate the performance on several examples and numerically study the behaviour of the algorithms for various data regimes.

* 70 pages, 16 figures 

  Access Paper or Ask Questions

A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization

Dec 27, 2013
Jiwei Li, Sujian Li

Both supervised learning methods and LDA based topic model have been successfully applied in the field of query focused multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experiments on TAC2008 and TAC2009 demonstrate the effectiveness of our approach.

* This paper has been withdrawn by the author due to a crucial sign error in equation 

  Access Paper or Ask Questions

The Evolution of Popularity and Images of Characters in Marvel Cinematic Universe Fanfictions

May 10, 2018
Fan Bu

This analysis proposes a new topic model to study the yearly trends in Marvel Cinematic Universe fanfictions on three levels: character popularity, character images/topics, and vocabulary pattern of topics. It is found that character appearances in fanfictions have become more diverse over the years thanks to constant introduction of new characters in feature films, and in the case of Captain America, multi-dimensional character development is well-received by the fanfiction world.

  Access Paper or Ask Questions

Gibbs Sampling Strategies for Semantic Perception of Streaming Video Data

Sep 10, 2015
Yogesh Girdhar, Gregory Dudek

Topic modeling of streaming sensor data can be used for high level perception of the environment by a mobile robot. In this paper we compare various Gibbs sampling strategies for topic modeling of streaming spatiotemporal data, such as video captured by a mobile robot. Compared to previous work on online topic modeling, such as o-LDA and incremental LDA, we show that the proposed technique results in lower online and final perplexity, given the realtime constraints.

  Access Paper or Ask Questions

Alquist 2.0: Alexa Prize Socialbot Based on Sub-Dialogue Models

Nov 06, 2020
Jan Pichl, Petr Marek, Jakub Konrád, Martin Matulík, Jan Šedivý

This paper presents the second version of the dialogue system named Alquist competing in Amazon Alexa Prize 2018. We introduce a system leveraging ontology-based topic structure called topic nodes. Each of the nodes consists of several sub-dialogues, and each sub-dialogue has its own LSTM-based model for dialogue management. The sub-dialogues can be triggered according to the topic hierarchy or a user intent which allows the bot to create a unique experience during each session.

  Access Paper or Ask Questions

Adversarial Learning for Zero-Shot Stance Detection on Social Media

May 14, 2021
Emily Allaway, Malavika Srikanth, Kathleen McKeown

Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-shot stance detection to new topics, highlighting future directions for zero-shot transfer.

* To appear in NAACL 2021 

  Access Paper or Ask Questions

Modeling Word Relatedness in Latent Dirichlet Allocation

Nov 10, 2014
Xun Wang

Standard LDA model suffers the problem that the topic assignment of each word is independent and word correlation hence is neglected. To address this problem, in this paper, we propose a model called Word Related Latent Dirichlet Allocation (WR-LDA) by incorporating word correlation into LDA topic models. This leads to new capabilities that standard LDA model does not have such as estimating infrequently occurring words or multi-language topic modeling. Experimental results demonstrate the effectiveness of our model compared with standard LDA.

  Access Paper or Ask Questions