Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Topic Modeling": models, code, and papers

Bayesian Nonparametric Modeling of Driver Behavior using HDP Split-Merge Sampling Algorithm

Jan 27, 2018
Vadim Smolyakov, Julian Straub, Sue Zheng, John W. Fisher III

Modern vehicles are equipped with increasingly complex sensors. These sensors generate large volumes of data that provide opportunities for modeling and analysis. Here, we are interested in exploiting this data to learn aspects of behaviors and the road network associated with individual drivers. Our dataset is collected on a standard vehicle used to commute to work and for personal trips. A Hidden Markov Model (HMM) trained on the GPS position and orientation data is utilized to compress the large amount of position information into a small amount of road segment states. Each state has a set of observations, i.e. car signals, associated with it that are quantized and modeled as draws from a Hierarchical Dirichlet Process (HDP). The inference for the topic distributions is carried out using HDP split-merge sampling algorithm. The topic distributions over joint quantized car signals characterize the driving situation in the respective road state. In a novel manner, we demonstrate how the sparsity of the personal road network of a driver in conjunction with a hierarchical topic model allows data driven predictions about destinations as well as likely road conditions.


Modeling the Experience of Emotion

Mar 04, 2009
Joost Broekens

Affective computing has proven to be a viable field of research comprised of a large number of multidisciplinary researchers resulting in work that is widely published. The majority of this work consists of computational models of emotion recognition, computational modeling of causal factors of emotion and emotion expression through rendered and robotic faces. A smaller part is concerned with modeling the effects of emotion, formal modeling of cognitive appraisal theory and models of emergent emotions. Part of the motivation for affective computing as a field is to better understand emotional processes through computational modeling. One of the four major topics in affective computing is computers that have emotions (the others are recognizing, expressing and understanding emotions). A critical and neglected aspect of having emotions is the experience of emotion (Barrett, Mesquita, Ochsner, and Gross, 2007): what does the content of an emotional episode look like, how does this content change over time and when do we call the episode emotional. Few modeling efforts have these topics as primary focus. The launch of a journal on synthetic emotions should motivate research initiatives in this direction, and this research should have a measurable impact on emotion research in psychology. I show that a good way to do so is to investigate the psychological core of what an emotion is: an experience. I present ideas on how the experience of emotion could be modeled and provide evidence that several computational models of emotion are already addressing the issue.


Learning Methods for Dynamic Topic Modeling in Automated Behaviour Analysis

Sep 18, 2017
Olga Isupova, Danil Kuzin, Lyudmila Mihaylova

Semi-supervised and unsupervised systems provide operators with invaluable support and can tremendously reduce the operators load. In the light of the necessity to process large volumes of video data and provide autonomous decisions, this work proposes new learning algorithms for activity analysis in video. The activities and behaviours are described by a dynamic topic model. Two novel learning algorithms based on the expectation maximisation approach and variational Bayes inference are proposed. Theoretical derivations of the posterior of model parameters are given. The designed learning algorithms are compared with the Gibbs sampling inference scheme introduced earlier in the literature. A detailed comparison of the learning algorithms is presented on real video data. We also propose an anomaly localisation procedure, elegantly embedded in the topic modeling framework. The proposed framework can be applied to a number of areas, including transportation systems, security and surveillance.

* 15 pages 

TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

Feb 27, 2017
Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley

In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence - both semantic and syntactic - but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis on the IMDB movie review dataset and report an error rate of $6.28\%$. This is comparable to the state-of-the-art $5.91\%$ resulting from a semi-supervised approach. Finally, TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation.

* International Conference on Learning Representations 

Dirichlet Variational Autoencoder for Text Modeling

Oct 31, 2018
Yijun Xiao, Tiancheng Zhao, William Yang Wang

We introduce an improved variational autoencoder (VAE) for text modeling with topic information explicitly modeled as a Dirichlet latent variable. By providing the proposed model topic awareness, it is more superior at reconstructing input texts. Furthermore, due to the inherent interactions between the newly introduced Dirichlet variable and the conventional multivariate Gaussian variable, the model is less prone to KL divergence vanishing. We derive the variational lower bound for the new model and conduct experiments on four different data sets. The results show that the proposed model is superior at text reconstruction across the latent space and classifications on learned representations have higher test accuracies.


Topic Modeling Based Extractive Text Summarization

Jun 29, 2021
Kalliath Abdul Rasheed Issam, Shivam Patel, Subalalitha C. N

Text summarization is an approach for identifying important information present within text documents. This computational technique aims to generate shorter versions of the source text, by including only the relevant and salient information present within the source text. In this paper, we propose a novel method to summarize a text document by clustering its contents based on latent topics produced using topic modeling techniques and by generating extractive summaries for each of the identified text clusters. All extractive sub-summaries are later combined to generate a summary for any given source document. We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization. This dataset is unlike the commonly used news datasets which are available for text summarization. The well-known news datasets present their most important information in the first few lines of their source texts, which make their summarization a lesser challenging task when compared to summarizing the WikiHow dataset. Contrary to these news datasets, the documents in the WikiHow dataset are written using a generalized approach and have lesser abstractedness and higher compression ratio, thus proposing a greater challenge to generate summaries. A lot of the current state-of-the-art text summarization techniques tend to eliminate important information present in source documents in the favor of brevity. Our proposed technique aims to capture all the varied information present in source documents. Although the dataset proved challenging, after performing extensive tests within our experimental setup, we have discovered that our model produces encouraging ROUGE results and summaries when compared to the other published extractive and abstractive text summarization models.

* International Journal of Innovative Technology and Exploring Engineering, Volume-9 Issue-6, April 2020, Page No. 1710-1719 
* 10 pages, 13 figures, 3 tables 

Multi-label Dataless Text Classification with Topic Modeling

Nov 05, 2017
Daochen Zha, Chenliang Li

Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few seed words relevant to each category, SMTM conducts multi-label classification for a collection of documents without any labeled document. In SMTM, each category is associated with a single category-topic which covers the meaning of the category. To accommodate with multi-labeled documents, we explicitly model the category sparsity in SMTM by using spike and slab prior and weak smoothing prior. That is, without using any threshold tuning, SMTM automatically selects the relevant categories for each document. To incorporate the supervision of the seed words, we propose a seed-guided biased GPU (i.e., generalized Polya urn) sampling procedure to guide the topic inference of SMTM. Experiments on two public datasets show that SMTM achieves better classification accuracy than state-of-the-art alternatives and even outperforms supervised solutions in some scenarios.


Partial Membership Latent Dirichlet Allocation

Dec 28, 2016
Chao Chen, Alina Zare, Huy Trinh, Gbeng Omotara, J. Tory Cobb, Timotius Lagaunne

Topic models (e.g., pLSA, LDA, sLDA) have been widely used for segmenting imagery. However, these models are confined to crisp segmentation, forcing a visual word (i.e., an image patch) to belong to one and only one topic. Yet, there are many images in which some regions cannot be assigned a crisp categorical label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, we present a partial membership latent Dirichlet allocation (PM-LDA) model and an associated parameter estimation algorithm. This model can be useful for imagery where a visual word may be a mixture of multiple topics. Experimental results on visual and sonar imagery show that PM-LDA can produce both crisp and soft semantic image segmentations; a capability previous topic modeling methods do not have.

* Version 1, Sent for Review. arXiv admin note: substantial text overlap with arXiv:1511.02821 

Semantic Knowledge Discovery and Discussion Mining of Incel Online Community: Topic modeling

Apr 21, 2021
Hamed Jelodar, Richard Frank

Online forums provide a unique opportunity for online users to share comments and exchange information on a particular topic. Understanding user behaviour is valuable to organizations and has applications for social and security strategies, for instance, identifying user opinions within a community or predicting future behaviour. Discovering the semantic aspects in Incel forums are the main goal of this research; we apply Natural language processing techniques based on topic modeling to latent topic discovery and opinion mining of users from a popular online Incel discussion forum. To prepare the input data for our study, we extracted the comments from The research experiments show that Artificial Intelligence (AI) based on NLP models can be effective for semantic and emotion knowledge discovery and retrieval of useful information from the Incel community. For example, we discovered semantic-related words that describe issues within a large volume of Incel comments, which is difficult with manual methods.