Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"chatbots": models, code, and papers

Leaf: Multiple-Choice Question Generation

Jan 22, 2022
Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs). The code and the demo are available on

* Accepted to ECIR 2022 (Demo) 
Access Paper or Ask Questions

Automatic Article Commenting: the Task and Dataset

May 11, 2018
Lianhui Qin, Lemao Liu, Victoria Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao, Shuming Shi

Comments of online articles provide extended views and improve user engagement. Automatically making comments thus become a valuable functionality for online forums, intelligent chatbots, etc. This paper proposes the new task of automatic article commenting, and introduces a large-scale Chinese dataset with millions of real comments and a human-annotated subset characterizing the comments' varying quality. Incorporating the human bias of comment quality, we further develop automatic metrics that generalize a broad set of popular reference-based metrics and exhibit greatly improved correlations with human evaluations.

* ACL2018; with supplements; Dataset link available in the paper 
Access Paper or Ask Questions

KorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension

Sep 17, 2019
Seungyoung Lim, Myungji Kim, Jooyoul Lee

Machine Reading Comprehension (MRC) is a task that requires machine to understand natural language and answer questions by reading a document. It is the core of automatic response technology such as chatbots and automatized customer supporting systems. We present Korean Question Answering Dataset(KorQuAD), a large-scale Korean dataset for extractive machine reading comprehension task. It consists of 70,000+ human generated question-answer pairs on Korean Wikipedia articles. We release KorQuAD1.0 and launch a challenge at to encourage the development of multilingual natural language processing research.

Access Paper or Ask Questions

Incrementalizing RASA's Open-Source Natural Language Understanding Pipeline

Jul 11, 2019
Andrew Rafla, Casey Kennington

As spoken dialogue systems and chatbots are gaining more widespread adoption, commercial and open-sourced services for natural language understanding are emerging. In this paper, we explain how we altered the open-source RASA natural language understanding pipeline to process incrementally (i.e., word-by-word), following the incremental unit framework proposed by Schlangen and Skantze. To do so, we altered existing RASA components to process incrementally, and added an update-incremental intent recognition model as a component to RASA. Our evaluations on the Snips dataset show that our changes allow RASA to function as an effective incremental natural language understanding service.

* 5 pages, 1 figure 
Access Paper or Ask Questions

Debbie, the Debate Bot of the Future

Sep 10, 2017
Geetanjali Rakshit, Kevin K. Bowden, Lena Reed, Amita Misra, Marilyn Walker

Chatbots are a rapidly expanding application of dialogue systems with companies switching to bot services for customer support, and new applications for users interested in casual conversation. One style of casual conversation is argument, many people love nothing more than a good argument. Moreover, there are a number of existing corpora of argumentative dialogues, annotated for agreement and disagreement, stance, sarcasm and argument quality. This paper introduces Debbie, a novel arguing bot, that selects arguments from conversational corpora, and aims to use them appropriately in context. We present an initial working prototype of Debbie, with some preliminary evaluation and describe future work.

* IWSDS 2017 
Access Paper or Ask Questions

A Neural Entity Coreference Resolution Review

Oct 21, 2019
Nikolaos Stylianou, Ioannis Vlahavas

Entity Coreference Resolution is the task of resolving all the mentions in a document that refer to the same real world entity and is considered as one of the most difficult tasks in natural language understanding. While in it is not an end task, it has been proved to improve downstream natural language processing tasks such as entity linking, machine translation, summarization and chatbots. We conducted a systematic a review of neural-based approached and provide a detailed appraisal of the datasets and evaluation metrics in the field. Emphasis is given on Pronoun Resolution, a subtask of Coreference Resolution, which has seen various improvements in the recent years. We conclude the study by highlight the lack of agreed upon standards and propose a way to expand the task even further.

Access Paper or Ask Questions

COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter

May 15, 2020
Martin Müller, Marcel Salathé, Per E Kummervold

In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19. Our model shows a 10-30% marginal improvement compared to its base model, BERT-Large, on five different classification datasets. The largest improvements are on the target domain. Pretrained transformer models, such as CT-BERT, are trained on a specific target domain and can be used for a wide variety of natural language processing tasks, including classification, question-answering and chatbots. CT-BERT is optimised to be used on COVID-19 content, in particular social media posts from Twitter.

Access Paper or Ask Questions

Decision-support for the Masses by Enabling Conversations with Open Data

Sep 16, 2018
Biplav Srivastava

Open data refers to data that is freely available for reuse. Although there has been rapid increase in availability of open data to public in the last decade, this has not translated into better decision-support tools for them. We propose intelligent conversation generators as a grand challenge that would automatically create data-driven conversation interfaces (CIs), also known as chatbots or dialog systems, from open data and deliver personalized analytical insights to users based on their contextual needs. Such generators will not only help bring Artificial Intelligence (AI)-based solutions for important societal problems to the masses but also advance AI by providing an integrative testbed for human-centric AI and filling gaps in the state-of-art towards this aim.

* 6 pages. arXiv admin note: text overlap with arXiv:1803.09789 
Access Paper or Ask Questions

A Russian Jeopardy! Data Set for Question-Answering Systems

Dec 04, 2021
Elena Mikhalkova

Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.

* Reviews from AIST-2021 (rejected) Rev. 1 SCORE: 3 (strong accept) Rev. 2 SCORE: 2 (accept) Rev. 3 SCORE: -3 (strong reject) Rev. 4 SCORE: -2 (reject) Meta-reviewer review. ..However, the authors did not study the obtained dataset, they did not apply any models to the data. Therefore the content of the paper seems too small if compared to other papers submitted to the conference 
Access Paper or Ask Questions

Learning Representations for Zero-Shot Retrieval over Structured Data

Oct 29, 2021
Harsh Kohli

Large Scale Question-Answering systems today are widely used in downstream applications such as chatbots and conversational dialogue agents. Typically, such systems consist of an Answer Passage retrieval layer coupled with Machine Comprehension models trained on natural language query-passage pairs. Recent studies have explored Question Answering over structured data sources such as web-tables and relational databases. However, architectures such as Seq2SQL assume the correct table a priori which is input to the model along with the free text question. Our proposed method, analogues to a passage retrieval model in traditional Question-Answering systems, describes an architecture to discern the correct table pertaining to a given query from amongst a large pool of candidate tables.

* 11 pages, 3 figures, 2 tables 
Access Paper or Ask Questions