Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

O2D2: Out-Of-Distribution Detector to Capture Undecidable Trials in Authorship Verification

Jul 02, 2021
Benedikt Boenninghoff, Robert M. Nickel, Dorothea Kolossa

The PAN 2021 authorship verification (AV) challenge is part of a three-year strategy, moving from a cross-topic/closed-set AV task to a cross-topic/open-set AV task over a collection of fanfiction texts. In this work, we present a novel hybrid neural-probabilistic framework that is designed to tackle the challenges of the 2021 task. Our system is based on our 2020 winning submission, with updates to significantly reduce sensitivities to topical variations and to further improve the system's calibration by means of an uncertainty-adaptation layer. Our framework additionally includes an out-of-distribution detector (O2D2) for defining non-responses. Our proposed system outperformed all other systems that participated in the PAN 2021 AV task.

* [email protected] 2021 

  Access Paper or Ask Questions

MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

Mar 27, 2022
Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu

This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS \& NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects \& topics. A detailed explanation of the solution, along with the above information, is provided in this study.

* ACM Conference on Health, Inference, and Learning (CHIL) 2022 
* Proceedings of Machine Learning Research (PMLR), ACM Conference on Health, Inference, and Learning (CHIL) 2022 

  Access Paper or Ask Questions

Textual Analysis of Communications in COVID-19 Infected Community on Social Media

May 03, 2021
Yuhan Liu, Yuhan Gao, Zhifan Nan, Long Chen

During the COVID-19 pandemic, people started to discuss about pandemic-related topics on social media. On subreddit \textit{r/COVID19positive}, a number of topics are discussed or being shared, including experience of those who got a positive test result, stories of those who presumably got infected, and questions asked regarding the pandemic and the disease. In this study, we try to understand, from a linguistic perspective, the nature of discussions on the subreddit. We found differences in linguistic characteristics (e.g. psychological, emotional and reasoning) across three different categories of topics. We also classified posts into the different categories using SOTA pre-trained language models. Such classification model can be used for pandemic-related research on social media.

* 5 pages, 4 figures, coursework for DS-GA 1011 

  Access Paper or Ask Questions

Categorical Mixture Models on VGGNet activations

Mar 06, 2018
Sean Billings

In this project, I use unsupervised learning techniques in order to cluster a set of yelp restaurant photos under meaningful topics. In order to do this, I extract layer activations from a pre-trained implementation of the popular VGGNet convolutional neural network. First, I explore using LDA with the activations of convolutional layers as features. Secondly, I explore using the object-recognition powers of VGGNet trained on ImageNet in order to extract meaningful objects from the photos, and then perform LDA to group the photos under topic-archetypes. I find that this second approach finds meaningful archetypes, which match the human intuition for photo topics such as restaurant, food, and drinks. Furthermore, these clusters align well and distinctly with the actual yelp photo labels.

  Access Paper or Ask Questions

RubyStar: A Non-Task-Oriented Mixture Model Dialog System

Dec 16, 2017
Huiting Liu, Tao Lin, Hanfei Sun, Weijian Lin, Chih-Wei Chang, Teng Zhong, Alexander Rudnicky

RubyStar is a dialog system designed to create "human-like" conversation by combining different response generation strategies. RubyStar conducts a non-task-oriented conversation on general topics by using an ensemble of rule-based, retrieval-based and generative methods. Topic detection, engagement monitoring, and context tracking are used for managing interaction. Predictable elements of conversation, such as the bot's backstory and simple question answering are handled by separate modules. We describe a rating scheme we developed for evaluating response generation. We find that character-level RNN is an effective generation model for general responses, with proper parameter settings; however other kinds of conversation topics might benefit from using other models.

  Access Paper or Ask Questions

Self-Calibrating Neural-Probabilistic Model for Authorship Verification Under Covariate Shift

Jun 21, 2021
Benedikt Boenninghoff, Dorothea Kolossa, Robert M. Nickel

We are addressing two fundamental problems in authorship verification (AV): Topic variability and miscalibration. Variations in the topic of two disputed texts are a major cause of error for most AV systems. In addition, it is observed that the underlying probability estimates produced by deep learning AV mechanisms oftentimes do not match the actual case counts in the respective training data. As such, probability estimates are poorly calibrated. We are expanding our framework from PAN 2020 to include Bayes factor scoring (BFS) and an uncertainty adaptation layer (UAL) to address both problems. Experiments with the 2020/21 PAN AV shared task data show that the proposed method significantly reduces sensitivities to topical variations and significantly improves the system's calibration.

* 12th International Conference of the CLEF Association, 2021 

  Access Paper or Ask Questions

The 1st Challenge on Remote Physiological Signal Sensing (RePSS)

Mar 26, 2020
Xiaobai Li, Hu Han, Hao Lu, Xuesong Niu, Zitong Yu, Antitza Dantcheva, Guoying Zhao, Shiguang Shan

Remote measurement of physiological signals from videos is an emerging topic. The topic draws great interests, but the lack of publicly available benchmark databases and a fair validation platform are hindering its further development. For this concern, we organize the first challenge on Remote Physiological Signal Sensing (RePSS), in which two databases of VIPL and OBF are provided as the benchmark for kin researchers to evaluate their approaches. The 1st challenge of RePSS focuses on measuring the average heart rate from facial videos, which is the basic problem of remote physiological measurement. This paper presents an overview of the challenge, including data, protocol, analysis of results and discussion. The top ranked solutions are highlighted to provide insights for researchers, and future directions are outlined for this topic and this challenge.

  Access Paper or Ask Questions

Yoga-Veganism: Correlation Mining of Twitter Health Data

Jun 15, 2019
Tunazzina Islam

Nowadays social media is a huge platform of data. People usually share their interest, thoughts via discussions, tweets, status. It is not possible to go through all the data manually. We need to mine the data to explore hidden patterns or unknown correlations, find out the dominant topic in data and understand people's interest through the discussions. In this work, we explore Twitter data related to health. We extract the popular topics under different categories (e.g. diet, exercise) discussed in Twitter via topic modeling, observe model behavior on new tweets, discover interesting correlation (i.e. Yoga-Veganism). We evaluate accuracy by comparing with ground truth using manual annotation both for train and test data.

* In Proceedings of 8th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM) @KDD 2019. arXiv admin note: substantial text overlap with arXiv:1906.02132 

  Access Paper or Ask Questions

Analyzing Self-Driving Cars on Twitter

Apr 05, 2018
Rizwan Sadiq, Mohsin Khan

This paper studies users' perception regarding a controversial product, namely self-driving (autonomous) cars. To find people's opinion regarding this new technology, we used an annotated Twitter dataset, and extracted the topics in positive and negative tweets using an unsupervised, probabilistic model known as topic modeling. We later used the topics, as well as linguist and Twitter specific features to classify the sentiment of the tweets. Regarding the opinions, the result of our analysis shows that people are optimistic and excited about the future technology, but at the same time they find it dangerous and not reliable. For the classification task, we found Twitter specific features, such as hashtags as well as linguistic features such as emphatic words among top attributes in classifying the sentiment of the tweets.

  Access Paper or Ask Questions

A Planning based Framework for Essay Generation

Jan 06, 2016
Bing Qin, Duyu Tang, Xinwei Geng, Dandan Ning, Jiahao Liu, Ting Liu

Generating an article automatically with computer program is a challenging task in artificial intelligence and natural language processing. In this paper, we target at essay generation, which takes as input a topic word in mind and generates an organized article under the theme of the topic. We follow the idea of text planning \cite{Reiter1997} and develop an essay generation framework. The framework consists of three components, including topic understanding, sentence extraction and sentence reordering. For each component, we studied several statistical algorithms and empirically compared between them in terms of qualitative or quantitative analysis. Although we run experiments on Chinese corpus, the method is language independent and can be easily adapted to other language. We lay out the remaining challenges and suggest avenues for future research.

  Access Paper or Ask Questions