Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox


Document Informed Neural Autoregressive Topic Models with Distributional Prior

Sep 15, 2018
Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schütze


Share this with someone who'll enjoy it:


We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., "networks" used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADE2 and iDocNADE2. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.

* AAAI2019. arXiv admin note: substantial text overlap with arXiv:1808.03793 


   Access Paper Source



Share this with someone who'll enjoy it: