Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Iryna Gurevych

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

May 01, 2020
Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, Iryna Gurevych

Figure 1 for AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Figure 2 for AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Figure 3 for AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Figure 4 for AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Current approaches to solving classification tasks in NLP involve fine-tuning a pre-trained language model on a single target task. This paper focuses on sharing knowledge extracted not only from a pre-trained language model, but also from several source tasks in order to achieve better performance on the target task. Sequential fine-tuning and multi-task learning are two methods for sharing information, but suffer from problems such as catastrophic forgetting and difficulties in balancing multiple tasks. Additionally, multi-task learning requires simultaneous access to data used for each of the tasks, which does not allow for easy extensions to new tasks on the fly. We propose a new architecture as well as a two-stage learning algorithm that allows us to effectively share knowledge from multiple tasks while avoiding these crucial problems. In the first stage, we learn task specific parameters that encapsulate the knowledge from each task. We then combine these learned representations in a separate combination step, termed AdapterFusion. We show that by separating the two stages, i.e., knowledge extraction and knowledge combination, the classifier can effectively exploit the representations learned from multiple tasks in a non destructive manner. We empirically evaluate our transfer learning approach on 16 diverse NLP tasks, and show that it outperforms traditional strategies such as full fine-tuning of the model as well as multi-task learning.

Via

Access Paper or Ask Questions

Aspect-Controlled Neural Argument Generation

Apr 30, 2020
Benjamin Schiller, Johannes Daxenberger, Iryna Gurevych

Figure 1 for Aspect-Controlled Neural Argument Generation

Figure 2 for Aspect-Controlled Neural Argument Generation

Figure 3 for Aspect-Controlled Neural Argument Generation

Figure 4 for Aspect-Controlled Neural Argument Generation

We rely on arguments in our daily lives to deliver our opinions and base them on evidence, making them more convincing in turn. However, finding and formulating arguments can be challenging. In this work, we train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect. We define argument aspect detection as a necessary method to allow this fine-granular control and crowdsource a dataset with 5,032 arguments annotated with aspects. Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments. Moreover, these arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments. We publish all datasets and code to fine-tune the language model.

Via

Access Paper or Ask Questions

MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

Apr 30, 2020
Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, Sebastian Ruder

Figure 1 for MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

Figure 2 for MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

Figure 3 for MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

Figure 4 for MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

The main goal behind state-of-the-art pretrained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pretraining. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and achieves competitive results on question answering.

Via

Access Paper or Ask Questions

Generating Persona-Consistent Dialogue Responses Using Deep Reinforcement Learning

Apr 30, 2020
Mohsen Mesgar, Edwin Simpson, Yue Wang, Iryna Gurevych

Figure 1 for Generating Persona-Consistent Dialogue Responses Using Deep Reinforcement Learning

Figure 2 for Generating Persona-Consistent Dialogue Responses Using Deep Reinforcement Learning

Figure 3 for Generating Persona-Consistent Dialogue Responses Using Deep Reinforcement Learning

Figure 4 for Generating Persona-Consistent Dialogue Responses Using Deep Reinforcement Learning

Recent transformer-based open-domain dialogue agents are trained by reference responses in a fully supervised scenario. Such agents often display inconsistent personalities as training data potentially contain contradictory responses to identical input utterances and no persona-relevant criteria are used in their training losses. We propose a novel approach to train transformer-based dialogue agents using actor-critic reinforcement learning. We define a new reward function to assess generated responses in terms of persona consistency, topic consistency, and fluency. Our reference-agnostic reward relies only on a dialogue history and a persona defined by a list of facts. Automatic and human evaluations on the PERSONACHAT dataset show that our proposed approach increases the rate of persona-consistent responses compared with its peers that are trained in a fully supervised scenario using reference responses.

Via

Access Paper or Ask Questions

A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Apr 30, 2020
Ilia Kuznetsov, Iryna Gurevych

Figure 1 for A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Figure 2 for A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Figure 3 for A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Figure 4 for A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Deep pre-trained contextualized encoders like BERT (Delvin et al., 2019) demonstrate remarkable performance on a range of downstream tasks. A recent line of research in probing investigates the linguistic knowledge implicitly learned by these models during pre-training. While most work in probing operates on the task level, linguistic tasks are rarely uniform and can be represented in a variety of formalisms. Any linguistics-based probing study thereby inevitably commits to the formalism used to annotate the underlying data. Can the choice of formalism affect probing results? To investigate, we conduct an in-depth cross-formalism layer probing study in role semantics. We find linguistically meaningful differences in the encoding of semantic role- and proto-role information by BERT depending on the formalism and demonstrate that layer probing can detect subtle differences between the implementations of the same linguistic formalism. Our results suggest that linguistic formalism is an important dimension in probing studies, along with the commonly used cross-task and cross-lingual experimental settings.

Via

Access Paper or Ask Questions

PuzzLing Machines: A Challenge on Learning From Small Data

Apr 27, 2020
Gözde Gül Şahin, Yova Kementchedjhieva, Phillip Rust, Iryna Gurevych

Figure 1 for PuzzLing Machines: A Challenge on Learning From Small Data

Figure 2 for PuzzLing Machines: A Challenge on Learning From Small Data

Figure 3 for PuzzLing Machines: A Challenge on Learning From Small Data

Figure 4 for PuzzLing Machines: A Challenge on Learning From Small Data

Deep neural models have repeatedly proved excellent at memorizing surface patterns from large datasets for various ML and NLP benchmarks. They struggle to achieve human-like thinking, however, because they lack the skill of iterative reasoning upon knowledge. To expose this problem in a new light, we introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students. These puzzles are carefully designed to contain only the minimal amount of parallel text necessary to deduce the form of unseen expressions. Solving them does not require external information (e.g., knowledge bases, visual signals) or linguistic expertise, but meta-linguistic awareness and deductive skills. Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages. We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected. We hope that this benchmark, available at https://ukplab.github.io/PuzzLing-Machines/, inspires further efforts towards a new paradigm in NLP---one that is grounded in human-like reasoning and understanding.

* Accepted to ACL 2020

Via

Access Paper or Ask Questions

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Apr 21, 2020
Nils Reimers, Iryna Gurevych

Figure 1 for Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Figure 2 for Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Figure 3 for Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Figure 4 for Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

We present an easy and efficient method to extend existing sentence embedding models to new languages. This allows to create multilingual versions from previously monolingual models. The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence. We use the original (monolingual) model to generate sentence embeddings for the source language and then train a new system on translated sentences to mimic the original model. Compared to other methods for training multilingual sentence embeddings, this approach has several advantages: It is easy to extend existing models with relatively few samples to new languages, it is easier to ensure desired properties for the vector space, and the hardware requirements for training is lower. We demonstrate the effectiveness of our approach for 10 languages from various language families. Code to extend sentence embeddings models to more than 400 languages is publicly available.

Via

Access Paper or Ask Questions

Metaphoric Paraphrase Generation

Feb 28, 2020
Kevin Stowe, Leonardo Ribeiro, Iryna Gurevych

Figure 1 for Metaphoric Paraphrase Generation

Figure 2 for Metaphoric Paraphrase Generation

Figure 3 for Metaphoric Paraphrase Generation

Figure 4 for Metaphoric Paraphrase Generation

This work describes the task of metaphoric paraphrase generation, in which we are given a literal sentence and are charged with generating a metaphoric paraphrase. We propose two different models for this task: a lexical replacement baseline and a novel sequence to sequence model, 'metaphor masking', that generates free metaphoric paraphrases. We use crowdsourcing to evaluate our results, as well as developing an automatic metric for evaluating metaphoric paraphrases. We show that while the lexical replacement baseline is capable of producing accurate paraphrases, they often lack metaphoricity, while our metaphor masking model excels in generating metaphoric sentences while performing nearly as well with regard to fluency and paraphrase quality.

* 10 pages, 3 figures

Via

Access Paper or Ask Questions

Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

Feb 16, 2020
Shweta Mahajan, Iryna Gurevych, Stefan Roth

Figure 1 for Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

Figure 2 for Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

Figure 3 for Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

Figure 4 for Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

Learned joint representations of images and text form the backbone of several important cross-domain tasks such as image captioning. Prior work mostly maps both domains into a common latent representation in a purely supervised fashion. This is rather restrictive, however, as the two domains follow distinct generative processes. Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately. The information shared between the domains is aligned with an invertible neural network. Our model integrates normalizing flow-based priors for the domain-specific information, which allows us to learn diverse many-to-many mappings between the two domains. We demonstrate the effectiveness of our model on diverse tasks, including image captioning and text-to-image synthesis.

* Published as a conference paper at ICLR 2020

Via

Access Paper or Ask Questions

Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

Jan 29, 2020
Leonardo F. R. Ribeiro, Yue Zhang, Claire Gardent, Iryna Gurevych

Figure 1 for Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

Figure 2 for Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

Figure 3 for Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

Figure 4 for Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representations. Global node encoding allows explicit communication between two distant nodes, thereby neglecting graph topology as all nodes are connected. In contrast, local node encoding considers the relations between directly connected nodes capturing the graph structure, but it can fail to capture long-range relations. In this work, we gather the best of both encoding strategies, proposing novel models that encode an input graph combining both global and local node contexts. Our approaches are able to learn better contextualized node embeddings for text generation. In our experiments, we demonstrate that our models lead to significant improvements in KG-to-text generation, achieving BLEU scores of 17.81 on AGENDA dataset, and 63.10 on the WebNLG dataset for seen categories, outperforming the state of the art by 3.51 and 2.51 points, respectively.

Via

Access Paper or Ask Questions