Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Sparse Stochastic Inference for Latent Dirichlet allocation

Jun 27, 2012
David Mimno, Matt Hoffman, David Blei

We present a hybrid algorithm for Bayesian topic models that combines the efficiency of sparse Gibbs sampling with the scalability of online stochastic inference. We used our algorithm to analyze a corpus of 1.2 million books (33 billion words) with thousands of topics. Our approach reduces the bias of variational inference and generalizes to many Bayesian hidden-variable models.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012) 

  Access Paper or Ask Questions

Literature Review: Human Segmentation with Static Camera

Oct 28, 2019
Jiaxin Xu, Rui Wang, Vaibhav Rakheja

Our research topic is Human segmentation with static camera. This topic can be divided into three sub-tasks, which are object detection, instance identification and segmentation. These sub-tasks are three closely related subjects. The development of each subject has great impact on the other two fields. In this literature review, we will first introduce the background of human segmentation and then talk about issues related to the above three fields as well as how they interact with each other.

  Access Paper or Ask Questions

Reddit-TUDFE: practical tool to explore Reddit usability in data science and knowledge processing

Oct 05, 2021
Jan Sawicki, Maria Ganzha, Marcin Paprzycki

This contribution argues that Reddit, as a massive, categorized, open-access dataset, can be used to conduct knowledge capture on "almost any topic". Presented analysis, is based on 180 manually annotated papers related to Reddit and data acquired from top databases of scientific papers. Moreover, an open source tool is introduced, which provides easy access to Reddit resources, and exploratory data analysis of how Reddit covers selected topics.

  Access Paper or Ask Questions

Unsupervised Keyphrase Extraction with Multipartite Graphs

Apr 16, 2018
Florian Boudin

We propose an unsupervised keyphrase extraction model that encodes topical information within a multipartite graph structure. Our model represents keyphrase candidates and topics in a single graph and exploits their mutually reinforcing relationship to improve candidate ranking. We further introduce a novel mechanism to incorporate keyphrase selection preferences into the model. Experiments conducted on three widely used datasets show significant improvements over state-of-the-art graph-based models.

* Accepted at NAACL 2018 

  Access Paper or Ask Questions

Scientific Paper Summarization Using Citation Summary Networks

Jul 10, 2008
Vahed Qazvinian, Dragomir R. Radev

Quickly moving to a new area of research is painful for researchers due to the vast amount of scientific literature in each field of study. One possible way to overcome this problem is to summarize a scientific topic. In this paper, we propose a model of summarizing a single article, which can be further used to summarize an entire topic. Our model is based on analyzing others' viewpoint of the target article's contributions and the study of its citation summary network using a clustering approach.

  Access Paper or Ask Questions

SCAT: Second Chance Autoencoder for Textual Data

May 11, 2020
Somaieh Goudarzvand, Gharib Gharibi, Yugyung Lee

We present a k-competitive learning approach for textual autoencoders named Second Chance Autoencoder (SCAT). SCAT selects the $k$ largest and smallest positive activations as the winner neurons, which gain the activation values of the loser neurons during the learning process, and thus focus on retrieving well-representative features for topics. Our experiments show that SCAT achieves outstanding performance in classification, topic modeling, and document visualization compared to LDA, K-Sparse, NVCTM, and KATE.

  Access Paper or Ask Questions

Ex-Twit: Explainable Twitter Mining on Health Data

May 24, 2019
Tunazzina Islam

Since most machine learning models provide no explanations for the predictions, their predictions are obscure for the human. The ability to explain a model's prediction has become a necessity in many applications including Twitter mining. In this work, we propose a method called Explainable Twitter Mining (Ex-Twit) combining Topic Modeling and Local Interpretable Model-agnostic Explanation (LIME) to predict the topic and explain the model predictions. We demonstrate the effectiveness of Ex-Twit on Twitter health-related data.

* In SocialNLP 2019 @ IJCAI-2019 

  Access Paper or Ask Questions

Datasets and Models for Authorship Attribution on Italian Personal Writings

Nov 16, 2020
Gaetana Ruggiero, Albert Gatt, Malvina Nissim

Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style.

* Accepted for publication in: 7th Italian Conference on Computational Linguistics (CLIC-IT 2020) 

  Access Paper or Ask Questions

Dynamic Search -- Optimizing the Game of Information Seeking

Sep 26, 2019
Zhiwen Tang, Grace Hui Yang

This article presents the emerging topic of dynamic search (DS). To position dynamic search in a larger research landscape, the article discusses in detail its relationship to related research topics and disciplines. The article reviews approaches to modeling dynamics during information seeking, with an emphasis on Reinforcement Learning (RL)-enabled methods. Details are given for how different approaches are used to model interactions among the human user, the search system, and the environment. The paper ends with a review of evaluations of dynamic search systems.

  Access Paper or Ask Questions

Linking Social Media Posts to News with Siamese Transformers

Jan 10, 2020
Jacob Danovitch

Many computational social science projects examine online discourse surrounding a specific trending topic. These works often involve the acquisition of large-scale corpora relevant to the event in question to analyze aspects of the response to the event. Keyword searches present a precision-recall trade-off and crowd-sourced annotations, while effective, are costly. This work aims to enable automatic and accurate ad-hoc retrieval of comments discussing a trending topic from a large corpus, using only a handful of seed news articles.

  Access Paper or Ask Questions