Alert button
Picture for Igor Shalyminov

Igor Shalyminov

Alert button

Enhancing Abstractiveness of Summarization Models through Calibrated Distillation

Oct 20, 2023
Hwanjun Song, Igor Shalyminov, Hang Su, Siffi Singh, Kaisheng Yao, Saab Mansour

Figure 1 for Enhancing Abstractiveness of Summarization Models through Calibrated Distillation
Figure 2 for Enhancing Abstractiveness of Summarization Models through Calibrated Distillation
Figure 3 for Enhancing Abstractiveness of Summarization Models through Calibrated Distillation
Figure 4 for Enhancing Abstractiveness of Summarization Models through Calibrated Distillation

Sequence-level knowledge distillation reduces the size of Seq2Seq models for more efficient abstractive summarization. However, it often leads to a loss of abstractiveness in summarization. In this paper, we propose a novel approach named DisCal to enhance the level of abstractiveness (measured by n-gram overlap) without sacrificing the informativeness (measured by ROUGE) of generated summaries. DisCal exposes diverse pseudo summaries with two supervision to the student model. Firstly, the best pseudo summary is identified in terms of abstractiveness and informativeness and used for sequence-level distillation. Secondly, their ranks are used to ensure the student model to assign higher prediction scores to summaries with higher ranks. Our experiments show that DisCal outperforms prior methods in abstractive summarization distillation, producing highly abstractive and informative summaries.

* Accepted at EMNLP-Findings 2023 
Viaarxiv icon

Data-Efficient Methods for Dialogue Systems

Dec 05, 2020
Igor Shalyminov

Figure 1 for Data-Efficient Methods for Dialogue Systems
Figure 2 for Data-Efficient Methods for Dialogue Systems
Figure 3 for Data-Efficient Methods for Dialogue Systems
Figure 4 for Data-Efficient Methods for Dialogue Systems

Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa or business-oriented solutions. Deep learning underlies many recent breakthroughs in dialogue systems but requires very large amounts of training data, often annotated by experts. Trained with smaller data, these methods end up severely lacking robustness (e.g. to disfluencies and out-of-domain input), and often just have too little generalisation power. In this thesis, we address the above issues by introducing a series of methods for training robust dialogue systems from minimal data. Firstly, we study two orthogonal approaches to dialogue: linguistically informed and machine learning-based - from the data efficiency perspective. We outline the steps to obtain data-efficient solutions with either approach. We then introduce two data-efficient models for dialogue response generation: the Dialogue Knowledge Transfer Network based on latent variable dialogue representations, and the hybrid Generative-Retrieval Transformer model (ranked first at the DSTC 8 Fast Domain Adaptation task). Next, we address the problem of robustness given minimal data. As such, propose a multitask LSTM-based model for domain-general disfluency detection. For the problem of out-of-domain input, we present Turn Dropout, a data augmentation technique for anomaly detection only using in-domain data, and introduce autoencoder-augmented models for efficient training with Turn Dropout. Finally, we focus on social dialogue and introduce a neural model for response ranking in social conversation used in Alana, the 3rd place winner in the Amazon Alexa Prize 2017 and 2018. We employ a novel technique of predicting the dialogue length as the main ranking objective and show that this approach improves upon the ratings-based counterpart in terms of data efficiency while matching it in performance.

* PhD thesis submitted at Heriot-Watt University. Contains previously published work (see the list in Section 1.4) 
Viaarxiv icon

Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation

Mar 06, 2020
Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz

Figure 1 for Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation
Figure 2 for Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation
Figure 3 for Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation
Figure 4 for Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation

Domain adaptation has recently become a key problem in dialogue systems research. Deep learning, while being the preferred technique for modeling such systems, works best given massive training data. However, in the real-world scenario, such resources aren't available for every new domain, so the ability to train with a few dialogue examples can be considered essential. Pre-training on large data sources and adapting to the target data has become the standard method for few-shot problems within the deep learning framework. In this paper, we present the winning entry at the fast domain adaptation task of DSTC8, a hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain MetaLWOz dataset. Robust and diverse in response generation, our model uses retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4% improvement over the 2nd place system) and attaining competitive generalization performance in adaptation to the unseen MultiWOZ dataset.

* Presented at DSTC8@AAAI 2020 
Viaarxiv icon

Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks

Oct 03, 2019
Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon

Figure 1 for Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks
Figure 2 for Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks
Figure 3 for Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks
Figure 4 for Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks

Goal-oriented dialogue systems are now being widely adopted in industry where it is of key importance to maintain a rapid prototyping cycle for new products and domains. Data-driven dialogue system development has to be adapted to meet this requirement --- therefore, reducing the amount of data and annotations necessary for training such systems is a central research problem. In this paper, we present the Dialogue Knowledge Transfer Network (DiKTNet), a state-of-the-art approach to goal-oriented dialogue generation which only uses a few example dialogues (i.e. few-shot learning), none of which has to be annotated. We achieve this by performing a 2-stage training. Firstly, we perform unsupervised dialogue representation pre-training on a large source of goal-oriented dialogues in multiple domains, the MetaLWOz corpus. Secondly, at the transfer stage, we train DiKTNet using this representation together with 2 other textual knowledge sources with different levels of generality: ELMo encoder and the main dataset's source domains. Our main dataset is the Stanford Multi-Domain dialogue corpus. We evaluate our model on it in terms of BLEU and Entity F1 scores, and show that our approach significantly and consistently improves upon a series of baseline models as well as over the previous state-of-the-art dialogue generation model, ZSDG. The improvement upon the latter --- up to 10% in Entity F1 and the average of 3% in BLEU score --- is achieved using only the equivalent of 10% of ZSDG's in-domain training data.

* EMNLP 2019 
Viaarxiv icon

Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach

Aug 16, 2019
Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon

Figure 1 for Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach
Figure 2 for Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach
Figure 3 for Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach
Figure 4 for Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach

Learning with minimal data is one of the key challenges in the development of practical, production-ready goal-oriented dialogue systems. In a real-world enterprise setting where dialogue systems are developed rapidly and are expected to work robustly for an ever-growing variety of domains, products, and scenarios, efficient learning from a limited number of examples becomes indispensable. In this paper, we introduce a technique to achieve state-of-the-art dialogue generation performance in a few-shot setup, without using any annotated data. We do this by leveraging background knowledge from a larger, more highly represented dialogue source --- namely, the MetaLWOz dataset. We evaluate our model on the Stanford Multi-Domain Dialogue Dataset, consisting of human-human goal-oriented dialogues in in-car navigation, appointment scheduling, and weather information domains. We show that our few-shot approach achieves state-of-the art results on that dataset by consistently outperforming the previous best model in terms of BLEU and Entity F1 scores, while being more data-efficient by not requiring any data annotation.

* Accepted at SigDial 2019 
Viaarxiv icon

Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation

May 24, 2019
Sungjin Lee, Igor Shalyminov

Figure 1 for Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation
Figure 2 for Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation
Figure 3 for Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation
Figure 4 for Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation

Neural dialog models often lack robustness to anomalous user input and produce inappropriate responses which leads to frustrating user experience. Although there are a set of prior approaches to out-of-domain (OOD) utterance detection, they share a few restrictions: they rely on OOD data or multiple sub-domains, and their OOD detection is context-independent which leads to suboptimal performance in a dialog. The goal of this paper is to propose a novel OOD detection method that does not require OOD data by utilizing counterfeit OOD turns in the context of a dialog. For the sake of fostering further research, we also release new dialog datasets which are 3 publicly available dialog corpora augmented with OOD turns in a controllable way. Our method outperforms state-of-the-art dialog models equipped with a conventional OOD detection mechanism by a large margin in the presence of OOD utterances.

* ICASSP 2019 
Viaarxiv icon

Improving Robustness of Neural Dialog Systems in a Data-Efficient Way with Turn Dropout

Nov 29, 2018
Igor Shalyminov, Sungjin Lee

Figure 1 for Improving Robustness of Neural Dialog Systems in a Data-Efficient Way with Turn Dropout
Figure 2 for Improving Robustness of Neural Dialog Systems in a Data-Efficient Way with Turn Dropout
Figure 3 for Improving Robustness of Neural Dialog Systems in a Data-Efficient Way with Turn Dropout

Neural network-based dialog models often lack robustness to anomalous, out-of-domain (OOD) user input which leads to unexpected dialog behavior and thus considerably limits such models' usage in mission-critical production environments. The problem is especially relevant in the setting of dialog system bootstrapping with limited training data and no access to OOD examples. In this paper, we explore the problem of robustness of such systems to anomalous input and the associated to it trade-off in accuracies on seen and unseen data. We present a new dataset for studying the robustness of dialog systems to OOD input, which is bAbI Dialog Task 6 augmented with OOD content in a controlled way. We then present turn dropout, a simple yet efficient negative sampling-based technique for improving robustness of neural dialog models. We demonstrate its effectiveness applied to Hybrid Code Network-family models (HCNs) which reach state-of-the-art results on our OOD-augmented dataset as well as the original one. Specifically, an HCN trained with turn dropout achieves state-of-the-art performance of more than 75% per-utterance accuracy on the augmented dataset's OOD turns and 74% F1-score as an OOD detector. Furthermore, we introduce a Variational HCN enhanced with turn dropout which achieves more than 56.5% accuracy on the original bAbI Task 6 dataset, thus outperforming the initially reported HCN's result.

* NeurIPS 2018 workshop on Conversational AI 
Viaarxiv icon

Neural Response Ranking for Social Conversation: A Data-Efficient Approach

Nov 02, 2018
Igor Shalyminov, Ondřej Dušek, Oliver Lemon

Figure 1 for Neural Response Ranking for Social Conversation: A Data-Efficient Approach
Figure 2 for Neural Response Ranking for Social Conversation: A Data-Efficient Approach
Figure 3 for Neural Response Ranking for Social Conversation: A Data-Efficient Approach
Figure 4 for Neural Response Ranking for Social Conversation: A Data-Efficient Approach

The overall objective of 'social' dialogue systems is to support engaging, entertaining, and lengthy conversations on a wide variety of topics, including social chit-chat. Apart from raw dialogue data, user-provided ratings are the most common signal used to train such systems to produce engaging responses. In this paper we show that social dialogue systems can be trained effectively from raw unannotated data. Using a dataset of real conversations collected in the 2017 Alexa Prize challenge, we developed a neural ranker for selecting 'good' system responses to user utterances, i.e. responses which are likely to lead to long and engaging conversations. We show that (1) our neural ranker consistently outperforms several strong baselines when trained to optimise for user ratings; (2) when trained on larger amounts of data and only using conversation length as the objective, the ranker performs better than the one trained using ratings -- ultimately reaching a Precision@1 of 0.87. This advance will make data collection for social conversational agents simpler and less expensive in the future.

* Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 1-8. ISBN 978-1-948087-75-9  
* 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI. Brussels, Belgium, October 31, 2018 
Viaarxiv icon

Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems

Oct 08, 2018
Igor Shalyminov, Arash Eshghi, Oliver Lemon

Figure 1 for Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems
Figure 2 for Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems
Figure 3 for Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems
Figure 4 for Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems

Spontaneous spoken dialogue is often disfluent, containing pauses, hesitations, self-corrections and false starts. Processing such phenomena is essential in understanding a speaker's intended meaning and controlling the flow of the conversation. Furthermore, this processing needs to be word-by-word incremental to allow further downstream processing to begin as early as possible in order to handle real spontaneous human conversational behaviour. In addition, from a developer's point of view, it is highly desirable to be able to develop systems which can be trained from `clean' examples while also able to generalise to the very diverse disfluent variations on the same data -- thereby enhancing both data-efficiency and robustness. In this paper, we present a multi-task LSTM-based model for incremental detection of disfluency structure, which can be hooked up to any component for incremental interpretation (e.g. an incremental semantic parser), or else simply used to `clean up' the current utterance as it is being produced. We train the system on the Switchboard Dialogue Acts (SWDA) corpus and present its accuracy on this dataset. Our model outperforms prior neural network-based incremental approaches by about 10 percentage points on SWDA while employing a simpler architecture. To test the model's generalisation potential, we evaluate the same model on the bAbI+ dataset, without any additional training. bAbI+ is a dataset of synthesised goal-oriented dialogues where we control the distribution of disfluencies and their types. This shows that our approach has good generalisation potential, and sheds more light on which types of disfluency might be amenable to domain-general processing.

* 9 pages, 1 figure, 7 tables. Accepted as a full paper for SemDial 2018 
Viaarxiv icon