We present a hybrid algorithm for Bayesian topic models that combines the efficiency of sparse Gibbs sampling with the scalability of online stochastic inference. We used our algorithm to analyze a corpus of 1.2 million books (33 billion words) with thousands of topics. Our approach reduces the bias of variational inference and generalizes to many Bayesian hidden-variable models.
Our research topic is Human segmentation with static camera. This topic can be divided into three sub-tasks, which are object detection, instance identification and segmentation. These sub-tasks are three closely related subjects. The development of each subject has great impact on the other two fields. In this literature review, we will first introduce the background of human segmentation and then talk about issues related to the above three fields as well as how they interact with each other.
This contribution argues that Reddit, as a massive, categorized, open-access dataset, can be used to conduct knowledge capture on "almost any topic". Presented analysis, is based on 180 manually annotated papers related to Reddit and data acquired from top databases of scientific papers. Moreover, an open source tool is introduced, which provides easy access to Reddit resources, and exploratory data analysis of how Reddit covers selected topics.
We propose an unsupervised keyphrase extraction model that encodes topical information within a multipartite graph structure. Our model represents keyphrase candidates and topics in a single graph and exploits their mutually reinforcing relationship to improve candidate ranking. We further introduce a novel mechanism to incorporate keyphrase selection preferences into the model. Experiments conducted on three widely used datasets show significant improvements over state-of-the-art graph-based models.
Quickly moving to a new area of research is painful for researchers due to the vast amount of scientific literature in each field of study. One possible way to overcome this problem is to summarize a scientific topic. In this paper, we propose a model of summarizing a single article, which can be further used to summarize an entire topic. Our model is based on analyzing others' viewpoint of the target article's contributions and the study of its citation summary network using a clustering approach.
We present a k-competitive learning approach for textual autoencoders named Second Chance Autoencoder (SCAT). SCAT selects the $k$ largest and smallest positive activations as the winner neurons, which gain the activation values of the loser neurons during the learning process, and thus focus on retrieving well-representative features for topics. Our experiments show that SCAT achieves outstanding performance in classification, topic modeling, and document visualization compared to LDA, K-Sparse, NVCTM, and KATE.
Since most machine learning models provide no explanations for the predictions, their predictions are obscure for the human. The ability to explain a model's prediction has become a necessity in many applications including Twitter mining. In this work, we propose a method called Explainable Twitter Mining (Ex-Twit) combining Topic Modeling and Local Interpretable Model-agnostic Explanation (LIME) to predict the topic and explain the model predictions. We demonstrate the effectiveness of Ex-Twit on Twitter health-related data.
Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style.
This article presents the emerging topic of dynamic search (DS). To position dynamic search in a larger research landscape, the article discusses in detail its relationship to related research topics and disciplines. The article reviews approaches to modeling dynamics during information seeking, with an emphasis on Reinforcement Learning (RL)-enabled methods. Details are given for how different approaches are used to model interactions among the human user, the search system, and the environment. The paper ends with a review of evaluations of dynamic search systems.
Many computational social science projects examine online discourse surrounding a specific trending topic. These works often involve the acquisition of large-scale corpora relevant to the event in question to analyze aspects of the response to the event. Keyword searches present a precision-recall trade-off and crowd-sourced annotations, while effective, are costly. This work aims to enable automatic and accurate ad-hoc retrieval of comments discussing a trending topic from a large corpus, using only a handful of seed news articles.