Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

ITTC @ TREC 2021 Clinical Trials Track

Feb 16, 2022
Thinh Hung Truong, Yulia Otmakhova, Rahmad Mahendra, Timothy Baldwin, Jey Han Lau, Trevor Cohn, Lawrence Cavedon, Damiano Spina, Karin Verspoor

This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track. The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes. We explore different ways of representing trials and topics using NLP techniques, and then use a common retrieval model to generate the ranked list of relevant trials for each topic. The results from all our submitted runs are well above the median scores for all topics, but there is still plenty of scope for improvement.

* 7 pages 

  Access Paper or Ask Questions

Kernel Topic Models

Oct 21, 2011
Philipp Hennig, David Stern, Ralf Herbrich, Thore Graepel

Latent Dirichlet Allocation models discrete data as a mixture of discrete distributions, using Dirichlet beliefs over the mixture weights. We study a variation of this concept, in which the documents' mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents. The main challenge is efficient approximate inference on the latent Gaussian. We present an approximate algorithm cast around a Laplace approximation in a transformed basis. The KTM can also be interpreted as a type of Gaussian process latent variable model, or as a topic model conditional on document features, uncovering links between earlier work in these areas.

  Access Paper or Ask Questions

Enriching very large ontologies using the WWW

Oct 17, 2000
Eneko Agirre, Olatz Ansa, Eduard Hovy, David Martinez

This paper explores the possibility to exploit text on the world wide web in order to enrich the concepts in existing ontologies. First, a method to retrieve documents from the WWW related to a concept is described. These document collections are used 1) to construct topic signatures (lists of topically related words) for each concept in WordNet, and 2) to build hierarchical clusters of the concepts (the word senses) that lexicalize a given word. The overall goal is to overcome two shortcomings of WordNet: the lack of topical links among concepts, and the proliferation of senses. Topic signatures are validated on a word sense disambiguation task with good results, which are improved when the hierarchical clusters are used.

* Procedings of the ECAI 2000 Workshop on Ontology Learning 
* 6 pages 

  Access Paper or Ask Questions

An Adaptation of Topic Modeling to Sentences

Jul 20, 2016
Ruey-Cheng Chen, Reid Swanson, Andrew S. Gordon

Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this paper, we adapt the approach of latent-Dirichlet allocation to include an additional layer for incorporating information about the sentence boundaries in documents. We show that the addition of this minimal information of document structure improves the perplexity results of a trained model.

* 8 pages, 2010, unpublished 

  Access Paper or Ask Questions

KRM-based Dialogue Management

Dec 02, 2019
Wenwu Qu, Xiaoyu Chi, Wei Zheng

A KRM-based dialogue management (DM) is proposed using to implement human-computer dialogue system in complex scenarios. KRM-based DM has a well description ability and it can ensure the logic of the dialogue process. Then a complex application scenario in the Internet of Things (IOT) industry and a dialogue system implemented based on the KRM-based DM will be introduced, where the system allows enterprise customers to customize topics and adapts corresponding topics in the interaction process with users. The experimental results show that the system can complete the interactive tasks well, and can effectively solve the problems of topic switching, information inheritance between topics, change of dominance.

* 9 pages, 4 figures, 

  Access Paper or Ask Questions

Variational Inference In Pachinko Allocation Machines

Apr 21, 2018
Akash Srivastava, Charles Sutton

The Pachinko Allocation Machine (PAM) is a deep topic model that allows representing rich correlation structures among topics by a directed acyclic graph over topics. Because of the flexibility of the model, however, approximate inference is very difficult. Perhaps for this reason, only a small number of potential PAM architectures have been explored in the literature. In this paper we present an efficient and flexible amortized variational inference method for PAM, using a deep inference network to parameterize the approximate posterior distribution in a manner similar to the variational autoencoder. Our inference method produces more coherent topics than state-of-art inference methods for PAM while being an order of magnitude faster, which allows exploration of a wider range of PAM architectures than have previously been studied.

  Access Paper or Ask Questions

An Exploratory Study of (#)Exercise in the Twittersphere

Dec 08, 2018
George Shaw, Amir Karami

Social media analytics allows us to extract, analyze, and establish semantic from user-generated contents in social media platforms. This study utilized a mixed method including a three-step process of data collection, topic modeling, and data annotation for recognizing exercise related patterns. Based on the findings, 86% of the detected topics were identified as meaningful topics after conducting the data annotation process. The most discussed exercise-related topics were physical activity (18.7%), lifestyle behaviors (6.6%), and dieting (4%). The results from our experiment indicate that the exploratory data analysis is a practical approach to summarizing the various characteristics of text data for different health and medical applications.

  Access Paper or Ask Questions

Content Modeling Using Latent Permutations

Jan 15, 2014
Harr Chen, S. R. K. Branavan, Regina Barzilay, David R. Karger

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.

* Journal Of Artificial Intelligence Research, Volume 36, pages 129-163, 2009 

  Access Paper or Ask Questions

Analysis of Risk Factor Domains in Psychosis Patient Health Records

Sep 15, 2018
Eben Holderness, Nicholas Miller, Philip Cawkwell, Kirsten Bolton, James Pustejovsky, Marie Meteer, Mei-Hua Hall

Readmission after discharge from a hospital is disruptive and costly, regardless of the reason. However, it can be particularly problematic for psychiatric patients, so predicting which patients may be readmitted is critically important but also very difficult. Clinical narratives in psychiatric electronic health records (EHRs) span a wide range of topics and vocabulary; therefore, a psychiatric readmission prediction model must begin with a robust and interpretable topic extraction component. We created a data pipeline for using document vector similarity metrics to perform topic extraction on psychiatric EHR data in service of our long-term goal of creating a readmission risk classifier. We show initial results for our topic extraction model and identify additional features we will be incorporating in the future.

* Accepted at EMNLP-LOUHI 2018 

  Access Paper or Ask Questions

A Dataset of General-Purpose Rebuttal

Sep 01, 2019
Matan Orbach, Yonatan Bilu, Ariel Gera, Yoav Kantor, Lena Dankin, Tamar Lavee, Lili Kotlerman, Shachar Mirkin, Michal Jacovi, Ranit Aharonov, Noam Slonim

In Natural Language Understanding, the task of response generation is usually focused on responses to short texts, such as tweets or a turn in a dialog. Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it. We do this in the context of the recently-suggested task of listening comprehension over argumentative content: given a speech on some specified topic, and a list of relevant arguments, the goal is to determine which of the arguments appear in the speech. The general rebuttals we describe here (written in English) overcome the need for topic-specific arguments to be provided, by proving to be applicable for a large set of topics. This allows creating responses beyond the scope of topics for which specific arguments are available. All data collected during this work is freely available for research.

* EMNLP 2019 

  Access Paper or Ask Questions