Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jacopo Staiano

QuestEval: Summarization Asks for Fact-based Evaluation

Apr 09, 2021

Thomas Scialom, Paul-Alexis Dray, Patrick Gallinari, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano, Alex Wang

Figure 1 for QuestEval: Summarization Asks for Fact-based Evaluation

Figure 2 for QuestEval: Summarization Asks for Fact-based Evaluation

Figure 3 for QuestEval: Summarization Asks for Fact-based Evaluation

Figure 4 for QuestEval: Summarization Asks for Fact-based Evaluation

Abstract:Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate this issue, recent work has proposed evaluation metrics which rely on question answering models to assess whether a summary contains all the relevant information in its source document. Though promising, the proposed approaches have so far failed to correlate better than ROUGE with human judgments. In this paper, we extend previous approaches and propose a unified framework, named QuestEval. In contrast to established metrics such as ROUGE or BERTScore, QuestEval does not require any ground-truth reference. Nonetheless, QuestEval substantially improves the correlation with human judgments over four evaluation dimensions (consistency, coherence, fluency, and relevance), as shown in the extensive experiments we report.

* project page: https://github.com/recitalAI/QuestEval

Via

Access Paper or Ask Questions

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Oct 23, 2020

Arij Riabi, Thomas Scialom, Rachel Keraron, Benoît Sagot, Djamé Seddah, Jacopo Staiano

Figure 1 for Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Figure 2 for Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Figure 3 for Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Figure 4 for Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Abstract:Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).

* 7 pages

Via

Access Paper or Ask Questions

Toward Stance-based Personas for Opinionated Dialogues

Oct 07, 2020

Thomas Scialom, Serra Sinem Tekiroglu, Jacopo Staiano, Marco Guerini

Figure 1 for Toward Stance-based Personas for Opinionated Dialogues

Figure 2 for Toward Stance-based Personas for Opinionated Dialogues

Figure 3 for Toward Stance-based Personas for Opinionated Dialogues

Figure 4 for Toward Stance-based Personas for Opinionated Dialogues

Abstract:In the context of chit-chat dialogues it has been shown that endowing systems with a persona profile is important to produce more coherent and meaningful conversations. Still, the representation of such personas has thus far been limited to a fact-based representation (e.g. "I have two cats."). We argue that these representations remain superficial w.r.t. the complexity of human personality. In this work, we propose to make a step forward and investigate stance-based persona, trying to grasp more profound characteristics, such as opinions, values, and beliefs to drive language generation. To this end, we introduce a novel dataset allowing to explore different stance-based persona representations and their impact on claim generation, showing that they are able to grasp abstract and profound aspects of the author persona.

* Accepted at Findings of EMNLP 2020

Via

Access Paper or Ask Questions

Project PIAF: Building a Native French Question-Answering Dataset

Jul 02, 2020

Rachel Keraron, Guillaume Lancrenon, Mathilde Bras, Frédéric Allary, Gilles Moyse, Thomas Scialom, Edmundo-Pavel Soriano-Morales, Jacopo Staiano

Figure 1 for Project PIAF: Building a Native French Question-Answering Dataset

Figure 2 for Project PIAF: Building a Native French Question-Answering Dataset

Figure 3 for Project PIAF: Building a Native French Question-Answering Dataset

Figure 4 for Project PIAF: Building a Native French Question-Answering Dataset

Abstract:Motivated by the lack of data for non-English languages, in particular for the evaluation of downstream tasks such as Question Answering, we present a participatory effort to collect a native French Question Answering Dataset. Furthermore, we describe and publicly release the annotation tool developed for our collection effort, along with the data obtained and preliminary baselines.

* LREC 2020

Via

Access Paper or Ask Questions

ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Jun 08, 2020

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano

Figure 1 for ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Figure 2 for ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Figure 3 for ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Figure 4 for ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Abstract:Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences. At the root of these limitations is the mismatch between training and inference, i.e. the so-called exposure bias, exacerbated by considering only the reference texts as correct, while in practice several alternative formulations could be as good. Generative Adversarial Networks (GANs) can mitigate those limitations but the discrete nature of text has hindered their application to language generation: the approaches proposed so far, based on Reinforcement Learning, have been shown to underperform MLE. Departing from previous works, we analyze the exploration step in GANs applied to text generation, and show how classical sampling results in unstable training. We propose to consider alternative exploration strategies in a GAN framework that we name ColdGANs, where we force the sampling to be close to the distribution modes to get smoother learning dynamics. For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on three generative tasks, namely unconditional text generation, question generation, and abstractive summarization.

Via

Access Paper or Ask Questions

MLSUM: The Multilingual Summarization Corpus

Apr 30, 2020

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano

Figure 1 for MLSUM: The Multilingual Summarization Corpus

Figure 2 for MLSUM: The Multilingual Summarization Corpus

Figure 3 for MLSUM: The Multilingual Summarization Corpus

Figure 4 for MLSUM: The Multilingual Summarization Corpus

Abstract:We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.

Via

Access Paper or Ask Questions

BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

Feb 25, 2020

Thomas Scialom, Patrick Bordes, Paul-Alexis Dray, Jacopo Staiano, Patrick Gallinari

Figure 1 for BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

Figure 2 for BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

Figure 3 for BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

Figure 4 for BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

Abstract:Pre-trained language models such as BERT have recently contributed to significant advances in Natural Language Processing tasks. Interestingly, while multilingual BERT models have demonstrated impressive results, recent works have shown how monolingual BERT can also be competitive in zero-shot cross-lingual settings. This suggests that the abstractions learned by these models can transfer across languages, even when trained on monolingual data. In this paper, we investigate whether such generalization potential applies to other modalities, such as vision: does BERT contain abstractions that generalize beyond text? We introduce BERT-gen, an architecture for text generation based on BERT, able to leverage on either mono- or multi- modal representations. The results reported under different configurations indicate a positive answer to our research question, and the proposed model obtains substantial improvements over the state-of-the-art on two established Visual Question Generation datasets.

Via

Access Paper or Ask Questions

Discriminative Adversarial Search for Abstractive Summarization

Feb 24, 2020

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano

Figure 1 for Discriminative Adversarial Search for Abstractive Summarization

Figure 2 for Discriminative Adversarial Search for Abstractive Summarization

Figure 3 for Discriminative Adversarial Search for Abstractive Summarization

Figure 4 for Discriminative Adversarial Search for Abstractive Summarization

Abstract:We introduce a novel approach for sequence decoding, Discriminative Adversarial Search (DAS), which has the desirable properties of alleviating the effects of exposure bias without requiring external metrics. Inspired by Generative Adversarial Networks (GANs), wherein a discriminator is used to improve the generator, our method differs from GANs in that the generator parameters are not updated at training time and the discriminator is only used to drive sequence generation at inference time. We investigate the effectiveness of the proposed approach on the task of Abstractive Summarization: the results obtained show that a naive application of DAS improves over the state-of-the-art methods, with further gains obtained via discriminator retraining. Moreover, we show how DAS can be effective for cross-domain adaptation. Finally, all results reported are obtained without additional rule-based filtering strategies, commonly used by the best performing systems available: this indicates that DAS can effectively be deployed without relying on post-hoc modifications of the generated outputs.

Via

Access Paper or Ask Questions

Ask to Learn: A Study on Curiosity-driven Question Generation

Nov 08, 2019

Thomas Scialom, Jacopo Staiano

Figure 1 for Ask to Learn: A Study on Curiosity-driven Question Generation

Figure 2 for Ask to Learn: A Study on Curiosity-driven Question Generation

Figure 3 for Ask to Learn: A Study on Curiosity-driven Question Generation

Figure 4 for Ask to Learn: A Study on Curiosity-driven Question Generation

Abstract:We propose a novel text generation task, namely Curiosity-driven Question Generation. We start from the observation that the Question Generation task has traditionally been considered as the dual problem of Question Answering, hence tackling the problem of generating a question given the text that contains its answer. Such questions can be used to evaluate machine reading comprehension. However, in real life, and especially in conversational settings, humans tend to ask questions with the goal of enriching their knowledge and/or clarifying aspects of previously gathered information. We refer to these inquisitive questions as Curiosity-driven: these questions are generated with the goal of obtaining new information (the answer) which is not present in the input text. In this work, we experiment on this new task using a conversational Question Answering (QA) dataset; further, since the majority of QA dataset are not built in a conversational manner, we describe a methodology to derive data for this novel task from non-conversational QA data. We investigate several automated metrics to measure the different properties of Curious Questions, and experiment different approaches on the Curiosity-driven Question Generation task, including model pre-training and reinforcement learning. Finally, we report a qualitative evaluation of the generated outputs.

* 13 pages, 3 figures

Via

Access Paper or Ask Questions

Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Sep 04, 2019

Thomas Scialom, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano

Figure 1 for Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Figure 2 for Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Figure 3 for Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Figure 4 for Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Abstract:Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non-differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from suboptimal accounting for fluency and readability of the generated abstracts. We thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compares to ROUGE -- with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as a reward.

* Accepted at EMNLP 2019

Via

Access Paper or Ask Questions