Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

James Henderson

Idiap Research Institute

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Nov 09, 2022

Alireza Mohammadshahi, Thomas Scialom, Majid Yazdani, Pouya Yanki, Angela Fan, James Henderson, Marzieh Saeidi

Abstract:Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer module, in which we use pre-trained models from the existing literature, and therefore, our metric can be used without further training. We show that RQUGE has a higher correlation with human judgment without relying on the reference question. RQUGE is shown to be significantly more robust to several adversarial corruptions. Additionally, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on the synthetic data generated by a question generation model and re-ranked by RQUGE.

* 19 pages, 8 figures

Via

Access Paper or Ask Questions

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Aug 12, 2022

James Henderson, Fabio Fehr

Figure 1 for A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Figure 2 for A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Figure 3 for A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Figure 4 for A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Abstract:We propose a VAE for Transformers by developing a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components supported by nonparametric methods captures the variable number of vectors supported by attention, and the exchangeability of our nonparametric distributions captures the permutation invariance of attention. This allows NVIB to regularise the number of vectors accessible with attention, as well as the amount of information in individual vectors. By regularising the cross-attention of a Transformer encoder-decoder with NVIB, we propose a nonparametric variational autoencoder (NVAE). Initial experiments on training a NVAE on natural language text show that the induced embedding space has the desired properties of a VAE for Transformers.

* 33 pages, 10 figures, 3 tables. First time this work has been made public

Via

Access Paper or Ask Questions

Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

May 23, 2022

Luis Espinosa-Anke, Alexander Shvets, Alireza Mohammadshahi, James Henderson, Leo Wanner

Figure 1 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Figure 2 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Figure 3 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Figure 4 for Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Abstract:Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.

* Accepted to *SEM2022

Via

Access Paper or Ask Questions

What Do Compressed Multilingual Machine Translation Models Forget?

May 22, 2022

Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier

Figure 1 for What Do Compressed Multilingual Machine Translation Models Forget?

Figure 2 for What Do Compressed Multilingual Machine Translation Models Forget?

Figure 3 for What Do Compressed Multilingual Machine Translation Models Forget?

Figure 4 for What Do Compressed Multilingual Machine Translation Models Forget?

Abstract:Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the model and therefore its inference time with negligible impact on top-tier metrics. However, the general performance hides a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the model. In this work, we analyze the impacts of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups and semantic features by extensive analysis of compressed models on different NMT benchmarks, e.g. FLORES-101, MT-Gender, and DiBiMT. Our experiments show that the performance of under-represented languages drops significantly, while the average BLEU metric slightly decreases. Interestingly, the removal of noisy memorization with the compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that the compression amplifies intrinsic gender and semantic biases, even in high-resource languages.

* 20 pages, 12 Figures, 9 Tables

Via

Access Paper or Ask Questions

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Apr 03, 2022

Rabeeh Karimi Mahabadi, Luke Zettlemoyer, James Henderson, Marzieh Saeidi, Lambert Mathias, Veselin Stoyanov, Majid Yazdani

Figure 1 for PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Figure 2 for PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Figure 3 for PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Figure 4 for PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Abstract:Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. PERFECT makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn new multi-token label embeddings during fine-tuning, which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. Our code is publicly available at https://github.com/rabeehk/perfect.

* ACL, 2022

Via

Access Paper or Ask Questions

Graph Refinement for Coreference Resolution

Mar 30, 2022

Lesly Miculicich, James Henderson

Figure 1 for Graph Refinement for Coreference Resolution

Figure 2 for Graph Refinement for Coreference Resolution

Figure 3 for Graph Refinement for Coreference Resolution

Figure 4 for Graph Refinement for Coreference Resolution

Abstract:The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution.

Via

Access Paper or Ask Questions

HyperMixer: An MLP-based Green AI Alternative to Transformers

Mar 07, 2022

Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson

Figure 1 for HyperMixer: An MLP-based Green AI Alternative to Transformers

Figure 2 for HyperMixer: An MLP-based Green AI Alternative to Transformers

Figure 3 for HyperMixer: An MLP-based Green AI Alternative to Transformers

Figure 4 for HyperMixer: An MLP-based Green AI Alternative to Transformers

Abstract:Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length and can be difficult to tune. In the pursuit of Green AI, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.

Via

Access Paper or Ask Questions

Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Oct 13, 2021

Florian Mai, James Henderson

Figure 1 for Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Figure 2 for Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Figure 3 for Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Figure 4 for Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Abstract:Text autoencoders are often used for unsupervised conditional text generation by applying mappings in the latent space to change attributes to the desired values. Recently, Mai et al. (2020) proposed Emb2Emb, a method to learn these mappings in the embedding space of an autoencoder. However, their method is restricted to autoencoders with a single-vector embedding, which limits how much information can be retained. We address this issue by extending their method to Bag-of-Vectors Autoencoders (BoV-AEs), which encode the text into a variable-size bag of vectors that grows with the size of the text, as in attention-based models. This allows to encode and reconstruct much longer texts than standard autoencoders. Analogous to conventional autoencoders, we propose regularization techniques that facilitate learning meaningful operations in the latent space. Finally, we adapt for a training scheme that learns to map an input bag to an output bag, including a novel loss function and neural architecture. Our experimental evaluations on unsupervised sentiment transfer and sentence summarization show that our method performs substantially better than a standard autoencoder.

* Under review

Via

Access Paper or Ask Questions

Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Sep 04, 2021

Christos Theodoropoulos, James Henderson, Andrei C. Coman, Marie-Francine Moens

Figure 1 for Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Figure 2 for Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Figure 3 for Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Figure 4 for Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Abstract:Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive learning to impose relation-related structure on the token-level representations of the sentence obtained with a CharacterBERT (El Boukkouri et al.,2020) model. The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task using only a simple KNN classifier, thereby demonstrating the success of the proposed method. Additional visualization by a tSNE analysis shows the effectiveness of the learned representation space compared to baselines. Furthermore, we show that we can learn a different space for named entity recognition, again using a contrastive learning objective, and demonstrate how to successfully combine both representation spaces in an entity-relation task.

* To be presented at CoNLL 2021

Via

Access Paper or Ask Questions

The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Jul 05, 2021

James Barry, Alireza Mohammadshahi, Joachim Wagner, Jennifer Foster, James Henderson

Figure 1 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Figure 2 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Figure 3 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Figure 4 for The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Abstract:We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLMRoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a post-processing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa-LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04.

* Submitted to the IWPT 2021 Shared Task: From Raw Text to Enhanced Universal Dependencies: the Parsing Shared Task at IWPT 2021

Via

Access Paper or Ask Questions