Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"chatbots": models, code, and papers

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

Feb 04, 2019
Thomas Wolf, Victor Sanh, Julien Chaumond, Clement Delangue

We introduce a new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model. Fine-tuning is performed by using a multi-task objective which combines several unsupervised prediction tasks. The resulting fine-tuned model shows strong improvements over the current state-of-the-art end-to-end conversational models like memory augmented seq2seq and information-retrieval models. On the privately held PERSONA-CHAT dataset of the Conversational Intelligence Challenge 2, this approach obtains a new state-of-the-art, with respective perplexity, [email protected] and F1 metrics of 16.28 (45 % absolute improvement), 80.7 (46 % absolute improvement) and 19.5 (20 % absolute improvement).

* 6 pages, 2 figures, 2 tables, NeurIPS 2018 CAI Workshop 

Improving Neural Conversational Models with Entropy-Based Data Filtering

Jun 04, 2019
Richard Csaky, Patrik Purgai, Gabor Recski

Current neural network-based conversational models lack diversity and generate boring responses to open-ended utterances. Priors such as persona, emotion, or topic provide additional information to dialog models to aid response generation, but annotating a dataset with priors is expensive and such annotations are rarely available. While previous methods for improving the quality of open-domain response generation focused on either the underlying model or the training objective, we present a method of filtering dialog datasets by removing generic utterances from training data using a simple entropy-based approach that does not require human supervision. We conduct extensive experiments with different variations of our method, and compare dialog models across 17 evaluation metrics to show that training on datasets filtered this way results in better conversational quality as chatbots learn to output more diverse responses.

* 20 pages. To be presented at ACL 2019. Camera-ready 

ColBERT: Using BERT Sentence Embedding for Humor Detection

Apr 27, 2020
Issa Annamoradnejad

Automatic humor detection has interesting use cases in modern technologies, such as chatbots and personal assistants. In this paper, we describe a novel approach for detecting humor in short texts using BERT sentence embedding. Our proposed model uses BERT to generate tokens and sentence embedding for texts. It sends embedding outputs as input to a two-layered neural network that predicts the target value. For evaluation, we created a new dataset for humor detection consisting of 200k formal short texts (100k positive, 100k negative). Experimental results show an accuracy of 98.1 percent for the proposed method, 2.1 percent improvement compared to the best CNN and RNN models and 1.1 percent better than a fine-tuned BERT model. In addition, the combination of RNN-CNN was not successful in this task compared to the CNN model.

* 7 pages, 3 tables 

Improved Text Language Identification for the South African Languages

Nov 01, 2017
Bernardt Duvenhage, Mfundo Ntini, Phala Ramonyai

Virtual assistants and text chatbots have recently been gaining popularity. Given the short message nature of text-based chat interactions, the language identification systems of these bots might only have 15 or 20 characters to make a prediction. However, accurate text language identification is important, especially in the early stages of many multilingual natural language processing pipelines. This paper investigates the use of a naive Bayes classifier, to accurately predict the language family that a piece of text belongs to, combined with a lexicon based classifier to distinguish the specific South African language that the text is written in. This approach leads to a 31% reduction in the language detection error. In the spirit of reproducible research the training and testing datasets as well as the code are published on github. Hopefully it will be useful to create a text language identification shared task for South African languages.

* Accepted to appear in the proceedings of The 28th Annual Symposium of the Pattern Recognition Association of South Africa, 2017 

Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

Aug 17, 2019
Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston

The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Gal\'an-Garc\'ia et al., 2016), and the deployment of chatbots in the public domain (Wolf et al., 2017) are two examples that show the necessity of guarding against adversarially offensive behavior on the part of humans. In this work, we develop a training scheme for a model to become robust to such human attacks by an iterative build it, break it, fix it strategy with humans and models in the loop. In detailed experiments we show this approach is considerably more robust than previous systems. Further, we show that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work. Our newly collected tasks and methods will be made open source and publicly available.


Theme-aware generation model for chinese lyrics

May 23, 2019
Jie Wang, Xinyan Zhao

With rapid development of neural networks, deep-learning has been extended to various natural language generation fields, such as machine translation, dialogue generation and even literature creation. In this paper, we propose a theme-aware language generation model for Chinese music lyrics, which improves the theme-connectivity and coherence of generated paragraphs greatly. A multi-channel sequence-to-sequence (seq2seq) model encodes themes and previous sentences as global and local contextual information. Moreover, attention mechanism is incorporated for sequence decoding, enabling to fuse context into predicted next texts. To prepare appropriate train corpus, LDA (Latent Dirichlet Allocation) is applied for theme extraction. Generated lyrics is grammatically correct and semantically coherent with selected themes, which offers a valuable modelling method in other fields including multi-turn chatbots, long paragraph generation and etc.


Dialog Intent Induction via Density-based Deep Clustering Ensemble

Jan 18, 2022
Jiashu Pu, Guandan Chen, Yongzhu Chang, Xiaoxi Mao

Existing task-oriented chatbots heavily rely on spoken language understanding (SLU) systems to determine a user's utterance's intent and other key information for fulfilling specific tasks. In real-life applications, it is crucial to occasionally induce novel dialog intents from the conversation logs to improve the user experience. In this paper, we propose the Density-based Deep Clustering Ensemble (DDCE) method for dialog intent induction. Compared to existing K-means based methods, our proposed method is more effective in dealing with real-life scenarios where a large number of outliers exist. To maximize data utilization, we jointly optimize texts' representations and the hyperparameters of the clustering algorithm. In addition, we design an outlier-aware clustering ensemble framework to handle the overfitting issue. Experimental results over seven datasets show that our proposed method significantly outperforms other state-of-the-art baselines.

* accepted by AAAI-22 W16: Dialog System Technology Challenge (DSTC10) 

I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

Dec 28, 2020
Yixin Nie, Mary Williamson, Mohit Bansal, Douwe Kiela, Jason Weston

To quantify how well natural language understanding models can capture consistency in a general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues. We then compare a structured utterance-based approach of using pre-trained Transformer models for contradiction detection with the typical unstructured approach. Results reveal that: (i) our newly collected dataset is notably more effective at providing supervision for the dialogue contradiction detection task than existing NLI data including those aimed to cover the dialogue domain; (ii) the structured utterance-based approach is more robust and transferable on both analysis and out-of-distribution dialogues than its unstructured counterpart. We also show that our best contradiction detection model correlates well with human judgments and further provide evidence for its usage in both automatically evaluating and improving the consistency of state-of-the-art generative chatbots.

* 15 pages