Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Residual Belief Propagation for Topic Modeling

Apr 30, 2012
Jia Zeng, Xiao-Qin Cao, Zhi-Qiang Liu

Fast convergence speed is a desired property for training latent Dirichlet allocation (LDA), especially in online and parallel topic modeling for massive data sets. This paper presents a novel residual belief propagation (RBP) algorithm to accelerate the convergence speed for training LDA. The proposed RBP uses an informed scheduling scheme for asynchronous message passing, which passes fast-convergent messages with a higher priority to influence those slow-convergent messages at each learning iteration. Extensive empirical studies confirm that RBP significantly reduces the training time until convergence while achieves a much lower predictive perplexity than other state-of-the-art training algorithms for LDA, including variational Bayes (VB), collapsed Gibbs sampling (GS), loopy belief propagation (BP), and residual VB (RVB).

* Advanced Data Mining and Applications Lecture Notes in Computer Science Volume 7713, 739-752, 2012 
* 6 pages, 8 figures 

  Access Paper or Ask Questions

Discriminative Topic Modeling with Logistic LDA

Sep 03, 2019
Iryna Korshunova, Hanchen Xiong, Mateusz Fedoryszak, Lucas Theis

Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging. Yet many problems with much richer data share a similar structure and could benefit from the vast literature on LDA. We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs. In particular, our model can easily be applied to groups of images, arbitrary text embeddings, and integrate well with deep neural networks. Although it is a discriminative model, we show that logistic LDA can learn from unlabeled data in an unsupervised manner by exploiting the group structure present in the data. In contrast to other recent topic models designed to handle arbitrary inputs, our model does not sacrifice the interpretability and principled motivation of LDA.

* Advances in Neural Information Processing Systems 32, 2019 

  Access Paper or Ask Questions

Deep Belief Nets for Topic Modeling

Jan 18, 2015
Lars Maaloe, Morten Arngren, Ole Winther

Applying traditional collaborative filtering to digital publishing is challenging because user data is very sparse due to the high volume of documents relative to the number of users. Content based approaches, on the other hand, is attractive because textual content is often very informative. In this paper we describe large-scale content based collaborative filtering for digital publishing. To solve the digital publishing recommender problem we compare two approaches: latent Dirichlet allocation (LDA) and deep belief nets (DBN) that both find low-dimensional latent representations for documents. Efficient retrieval can be carried out in the latent representation. We work both on public benchmarks and digital media content provided by Issuu, an online publishing platform. This article also comes with a newly developed deep belief nets toolbox for topic modeling tailored towards performance evaluation of the DBN model and comparisons to the LDA model.

* Accepted to the ICML-2014 Workshop on Knowledge-Powered Deep Learning for Text Mining 

  Access Paper or Ask Questions

Multi-Topic Multi-Document Summarizer

Jan 03, 2014
Fatma El-Ghannam, Tarek El-Shishtawy

Current multi-document summarization systems can successfully extract summary sentences, however with many limitations including: low coverage, inaccurate extraction to important sentences, redundancy and poor coherence among the selected sentences. The present study introduces a new concept of centroid approach and reports new techniques for extracting summary sentences for multi-document. In both techniques keyphrases are used to weigh sentences and documents. The first summarization technique (Sen-Rich) prefers maximum richness sentences. While the second (Doc-Rich), prefers sentences from centroid document. To demonstrate the new summarization system application to extract summaries of Arabic documents we performed two experiments. First, we applied Rouge measure to compare the new techniques among systems presented at TAC2011. The results show that Sen-Rich outperformed all systems in ROUGE-S. Second, the system was applied to summarize multi-topic documents. Using human evaluators, the results show that Doc-Rich is the superior, where summary sentences characterized by extra coverage and more cohesion.

* International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 6, December 2013 

  Access Paper or Ask Questions

A simple non-parametric Topic Mixture for Authors and Documents

Dec 04, 2012
Arnim Bleier

This article reviews the Author-Topic Model and presents a new non-parametric extension based on the Hierarchical Dirichlet Process. The extension is especially suitable when no prior information about the number of components necessary is available. A blocked Gibbs sampler is described and focus put on staying as close as possible to the original model with only the minimum of theoretical and implementation overhead necessary.

  Access Paper or Ask Questions

Detecting Sub-Topic Correspondence through Bipartite Term Clustering

Aug 01, 1999
Zvika Marx, Ido Dagan, Eli Shamir

This paper addresses a novel task of detecting sub-topic correspondence in a pair of text fragments, enhancing common notions of text similarity. This task is addressed by coupling corresponding term subsets through bipartite clustering. The paper presents a cost-based clustering scheme and compares it with a bipartite version of the single-link method, providing illustrating results.

* Proceedings of ACL'99 Workshop on Unsupervised Learning in Natural Language Processing, 1999, pp 45-51 
* html with 3 gif figures; generated from 7 pages MS-Word file 

  Access Paper or Ask Questions

Unsupervised Topic Adaptation for Lecture Speech Retrieval

Jul 10, 2004
Atsushi Fujii, Katunobu Itou, Tomoyosi Akiba, Tetsuya Ishikawa

We are developing a cross-media information retrieval system, in which users can view specific segments of lecture videos by submitting text queries. To produce a text index, the audio track is extracted from a lecture video and a transcription is generated by automatic speech recognition. In this paper, to improve the quality of our retrieval system, we extensively investigate the effects of adapting acoustic and language models on speech recognition. We perform an MLLR-based method to adapt an acoustic model. To obtain a corpus for language model adaptation, we use the textbook for a target lecture to search a Web collection for the pages associated with the lecture topic. We show the effectiveness of our method by means of experiments.

* Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP 2004), pp.2957-2960, Oct. 2004 
* 4 pages, Proceedings of the 8th International Conference on Spoken Language Processing (to appear) 

  Access Paper or Ask Questions

SentiBubbles: Topic Modeling and Sentiment Visualization of Entity-centric Tweets

Jan 23, 2018
João Oliveira, Mike Pinto, Pedro Saleiro, Jorge Teixeira

Social Media users tend to mention entities when reacting to news events. The main purpose of this work is to create entity-centric aggregations of tweets on a daily basis. By applying topic modeling and sentiment analysis, we create data visualization insights about current events and people reactions to those events from an entity-centric perspective.

  Access Paper or Ask Questions

Bayesian Nonparametrics in Topic Modeling: A Brief Tutorial

Jan 16, 2015
Alexander Spangher

Using nonparametric methods has been increasingly explored in Bayesian hierarchical modeling as a way to increase model flexibility. Although the field shows a lot of promise, inference in many models, including Hierachical Dirichlet Processes (HDP), remain prohibitively slow. One promising path forward is to exploit the submodularity inherent in Indian Buffet Process (IBP) to derive near-optimal solutions in polynomial time. In this work, I will present a brief tutorial on Bayesian nonparametric methods, especially as they are applied to topic modeling. I will show a comparison between different non-parametric models and the current state-of-the-art parametric model, Latent Dirichlet Allocation (LDA).

* 7 pages, unpublished 

  Access Paper or Ask Questions

Improving Context Modeling in Neural Topic Segmentation

Oct 07, 2020
Linzi Xing, Brad Hackinen, Giuseppe Carenini, Francesco Trebbi

Topic segmentation is critical in key NLP tasks and recent works favor highly effective neural supervised approaches. However, current neural solutions are arguably limited in how they model context. In this paper, we enhance a segmenter based on a hierarchical attention BiLSTM network to better model context, by adding a coherence-related auxiliary task and restricted self-attention. Our optimized segmenter outperforms SOTA approaches when trained and tested on three datasets. We also the robustness of our proposed model in domain transfer setting by training a model on a large-scale dataset and testing it on four challenging real-world benchmarks. Furthermore, we apply our proposed strategy to two other languages (German and Chinese), and show its effectiveness in multilingual scenarios.

* Accepted at AACL-IJCNLP 2020 

  Access Paper or Ask Questions