Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Improving automated segmentation of radio shows with audio embeddings

Feb 12, 2020
Oberon Berlage, Klaus-Michael Lux, David Graus

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.

* 5 pages, 2 figures, submitted to ICASSP2020 

  Access Paper or Ask Questions

Societal Controversies in Wikipedia Articles

Apr 18, 2019
Erik Borra, Andreas Kaltenbrunner, Michele Mauri, Esther Weltevrede, David Laniado, Richard Rogers, Paolo Ciuccarelli, Giovanni Magni, Tommaso Venturini

Collaborative content creation inevitably reaches situations where different points of view lead to conflict. We focus on Wikipedia, the free encyclopedia anyone may edit, where disputes about content in controversial articles often reflect larger societal debates. While Wikipedia has a public edit history and discussion section for every article, the substance of these sections is difficult to phantom for Wikipedia users interested in the development of an article and in locating which topics were most controversial. In this paper we present Contropedia, a tool that augments Wikipedia articles and gives insight into the development of controversial topics. Contropedia uses an efficient language agnostic measure based on the edit history that focuses on wiki links to easily identify which topics within a Wikipedia article have been most controversial and when.

* the 33rd Annual ACM Conference, Apr 2015, Seoul, France. pp.193-196 

  Access Paper or Ask Questions

Exploring semantically-related concepts from Wikipedia: the case of SeRE

Apr 27, 2015
Daniel Hienert, Dennis Wegener, Siegfried Schomisch

In this paper we present our web application SeRE designed to explore semantically related concepts. Wikipedia and DBpedia are rich data sources to extract related entities for a given topic, like in- and out-links, broader and narrower terms, categorisation information etc. We use the Wikipedia full text body to compute the semantic relatedness for extracted terms, which results in a list of entities that are most relevant for a topic. For any given query, the user interface of SeRE visualizes these related concepts, ordered by semantic relatedness; with snippets from Wikipedia articles that explain the connection between those two entities. In a user study we examine how SeRE can be used to find important entities and their relationships for a given topic and to answer the question of how the classification system can be used for filtering.

* In Classification & visualization : interfaces to knowledge ; proceedings of the International UDC Seminar 24 - 25 October 2013, The Hague, The Netherlands, edited by Aida Slavic, Almila Akdag Salah, and Sylvie Davies, 153-165. W\"urzburg: Ergon-Verl 

  Access Paper or Ask Questions

Athena: Constructing Dialogues Dynamically with Discourse Constraints

Nov 21, 2020
Vrindavan Harrison, Juraj Juraska, Wen Cui, Lena Reed, Kevin K. Bowden, Jiaqi Wu, Brian Schwarzmann, Abteen Ebrahimi, Rishi Rajasekaran, Nikhil Varghese, Max Wechsler-Azen, Steve Whittaker, Jeffrey Flanigan, Marilyn Walker

This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response generators. This allows Athena to procure responses from dynamic sources, such as knowledge graph traversals and feature-based on-the-fly response retrieval methods. After describing the dialogue system architecture, we perform an analysis of conversations that Athena participated in during the 2019 Alexa Prize Competition. We conclude with a report on several user studies we carried out to better understand how individual user characteristics affect system ratings.

* 3rd Proceedings of Alexa Prize (Alexa Prize 2019) 

  Access Paper or Ask Questions

A Survey of Syntactic-Semantic Parsing Based on Constituent and Dependency Structures

Jun 19, 2020
Meishan Zhang

Syntactic and semantic parsing has been investigated for decades, which is one primary topic in the natural language processing community. This article aims for a brief survey on this topic. The parsing community includes many tasks, which are difficult to be covered fully. Here we focus on two of the most popular formalizations of parsing: constituent parsing and dependency parsing. Constituent parsing is majorly targeted to syntactic analysis, and dependency parsing can handle both syntactic and semantic analysis. This article briefly reviews the representative models of constituent parsing and dependency parsing, and also dependency graph parsing with rich semantics. Besides, we also review the closely-related topics such as cross-domain, cross-lingual and joint parsing models, parser application as well as corpus development of parsing in the article.

* SCIENCE CHINA Technological Sciences 

  Access Paper or Ask Questions

Short Text Classification Improved by Feature Space Extension

Apr 02, 2019
Yanxuan Li

With the explosive development of mobile Internet, short text has been applied extensively. The difference between classifying short text and long documents is that short text is of shortness and sparsity. Thus, it is challenging to deal with short text classification owing to its less semantic information. In this paper, we propose a novel topic-based convolutional neural network (TB-CNN) based on Latent Dirichlet Allocation (LDA) model and convolutional neural network. Comparing to traditional CNN methods, TB-CNN generates topic words with LDA model to reduce the sparseness and combines the embedding vectors of topic words and input words to extend feature space of short text. The validation results on IMDB movie review dataset show the improvement and effectiveness of TB-CNN.

* 8 pages,2 figures and 7 be published in 

  Access Paper or Ask Questions

Leveraging Natural Learning Processing to Uncover Themes in Clinical Notes of Patients Admitted for Heart Failure

Apr 14, 2022
Ankita Agarwal, Krishnaprasad Thirunarayan, William L. Romine, Amanuel Alambo, Mia Cajita, Tanvi Banerjee

Heart failure occurs when the heart is not able to pump blood and oxygen to support other organs in the body as it should. Treatments include medications and sometimes hospitalization. Patients with heart failure can have both cardiovascular as well as non-cardiovascular comorbidities. Clinical notes of patients with heart failure can be analyzed to gain insight into the topics discussed in these notes and the major comorbidities in these patients. In this regard, we apply machine learning techniques, such as topic modeling, to identify the major themes found in the clinical notes specific to the procedures performed on 1,200 patients admitted for heart failure at the University of Illinois Hospital and Health Sciences System (UI Health). Topic modeling revealed five hidden themes in these clinical notes, including one related to heart disease comorbidities.

* 4 pages, 2 tables, accepted in IEEE EMBC 2022 conference (IEEE International Engineering in Medicine and Biology Conference) 

  Access Paper or Ask Questions

A Generalized Hierarchical Nonnegative Tensor Decomposition

Sep 30, 2021
Joshua Vendrow, Jamie Haddock, Deanna Needell

Nonnegative matrix factorization (NMF) has found many applications including topic modeling and document analysis. Hierarchical NMF (HNMF) variants are able to learn topics at various levels of granularity and illustrate their hierarchical relationship. Recently, nonnegative tensor factorization (NTF) methods have been applied in a similar fashion in order to handle data sets with complex, multi-modal structure. Hierarchical NTF (HNTF) methods have been proposed, however these methods do not naturally generalize their matrix-based counterparts. Here, we propose a new HNTF model which directly generalizes a HNMF model special case, and provide a supervised extension. We also provide a multiplicative updates training method for this model. Our experimental results show that this model more naturally illuminates the topic hierarchy than previous HNMF and HNTF methods.

* 6 pages, 2 figues, 3 tables 

  Access Paper or Ask Questions

Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial Footprint

May 28, 2016
Abhinav Maurya

Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. We describe Spatially Compact Semantic Scan (SCSS) that has been developed specifically to overcome the shortcomings of current methods in detecting new spatially compact events in text streams. SCSS employs alternating optimization between using semantic scan to estimate contrastive foreground topics in documents, and discovering spatial neighborhoods with high occurrence of documents containing the foreground topics. We evaluate our method on Emergency Department chief complaints dataset (ED dataset) to verify the effectiveness of our method in detecting real-world disease outbreaks from free-text ED chief complaint data.

* 26 pages 

  Access Paper or Ask Questions

Stochastic Variational Inference

Apr 22, 2013
Matt Hoffman, David M. Blei, Chong Wang, John Paisley

We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.

  Access Paper or Ask Questions