Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shashi Narayan

SMART: Sentences as Basic Units for Text Evaluation

Aug 01, 2022

Reinald Kim Amplayo, Peter J. Liu, Yao Zhao, Shashi Narayan

Figure 1 for SMART: Sentences as Basic Units for Text Evaluation

Figure 2 for SMART: Sentences as Basic Units for Text Evaluation

Figure 3 for SMART: Sentences as Basic Units for Text Evaluation

Figure 4 for SMART: Sentences as Basic Units for Text Evaluation

Abstract:Widely used evaluation metrics for text generation either do not work well with longer texts or fail to evaluate all aspects of text quality. In this paper, we introduce a new metric called SMART to mitigate such limitations. Specifically, We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences. Candidate sentences are also compared to sentences in the source documents to allow grounding (e.g., factuality) evaluation. Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics on the SummEval summarization meta-evaluation dataset, while the same metric with a string-based matching function is competitive with current model-based metrics. The latter does not use any neural model, which is useful during model development phases where resources can be limited and fast evaluation is required. Finally, we also conducted extensive analyses showing that our proposed metrics work well with longer summaries and are less biased towards specific models.

* code coming soon

Via

Access Paper or Ask Questions

Conditional Generation with a Question-Answering Blueprint

Jul 01, 2022

Shashi Narayan, Joshua Maynez, Reinald Kim Amplayo, Kuzman Ganchev, Annie Louis, Fantine Huot, Dipanjan Das, Mirella Lapata

Figure 1 for Conditional Generation with a Question-Answering Blueprint

Figure 2 for Conditional Generation with a Question-Answering Blueprint

Figure 3 for Conditional Generation with a Question-Answering Blueprint

Figure 4 for Conditional Generation with a Question-Answering Blueprint

Abstract:The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs. We enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for both content selection (i.e.,~what to say) and planning (i.e.,~in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output.

* 19 pages

Via

Access Paper or Ask Questions

A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

Mar 28, 2022

Shashi Narayan, Gonçalo Simões, Yao Zhao, Joshua Maynez, Dipanjan Das, Michael Collins, Mirella Lapata

Figure 1 for A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

Figure 2 for A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

Figure 3 for A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

Figure 4 for A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

Abstract:We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality compared to previous stochastic decoding strategies. It builds on recently proposed plan-based neural generation models (Narayan et al, 2021) that are trained to first create a composition of the output and then generate by conditioning on it and the input. Our approach avoids text degeneration by first sampling a composition in the form of an entity chain and then using beam search to generate the best possible text grounded to this entity chain. Experiments on summarization (CNN/DailyMail and XSum) and question generation (SQuAD), using existing and newly proposed automatic metrics together with human-based evaluation, demonstrate that Composition Sampling is currently the best available decoding strategy for generating diverse meaningful outputs.

* 21 pages, ACL 2022

Via

Access Paper or Ask Questions

MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

Sep 22, 2021

Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas

Figure 1 for MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

Figure 2 for MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

Figure 3 for MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

Figure 4 for MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

Abstract:One of the most challenging aspects of current single-document news summarization is that the summary often contains 'extrinsic hallucinations', i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarization systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate this problem with the help of multiple supplementary resource documents assisting the task. We present a new dataset MiRANews and benchmark existing summarization models. In contrast to multi-document summarization, which addresses multiple events from several source documents, we still aim at generating a summary for a single document. We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles. An error analysis of generated summaries from pretrained models fine-tuned on MiRANews reveals that this has an even bigger effects on models: assisted summarization reduces 55% of hallucinations when compared to single-document summarization models trained on the main article only. Our code and data are available at https://github.com/XinnuoXu/MiRANews.

* Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Findings (EMNLP2021 Findings)

Via

Access Paper or Ask Questions

Focus Attention: Promoting Faithfulness and Diversity in Summarization

May 25, 2021

Rahul Aralikatte, Shashi Narayan, Joshua Maynez, Sascha Rothe, Ryan McDonald

Figure 1 for Focus Attention: Promoting Faithfulness and Diversity in Summarization

Figure 2 for Focus Attention: Promoting Faithfulness and Diversity in Summarization

Figure 3 for Focus Attention: Promoting Faithfulness and Diversity in Summarization

Figure 4 for Focus Attention: Promoting Faithfulness and Diversity in Summarization

Abstract:Professional summaries are written with document-level information, such as the theme of the document, in mind. This is in contrast with most seq2seq decoders which simultaneously learn to focus on salient content, while deciding what to generate, at each decoding step. With the motivation to narrow this gap, we introduce Focus Attention Mechanism, a simple yet effective method to encourage decoders to proactively generate tokens that are similar or topical to the input document. Further, we propose a Focus Sampling method to enable generation of diverse summaries, an area currently understudied in summarization. When evaluated on the BBC extreme summarization task, two state-of-the-art models augmented with Focus Attention generate summaries that are closer to the target and more faithful to their input documents, outperforming their vanilla counterparts on \rouge and multiple faithfulness measures. We also empirically demonstrate that Focus Sampling is more effective in generating diverse and faithful summaries than top-$k$ or nucleus sampling-based decoding methods.

* ACL 2021

Via

Access Paper or Ask Questions

Planning with Entity Chains for Abstractive Summarization

Apr 15, 2021

Shashi Narayan, Yao Zhao, Joshua Maynez, Gonçalo Simoes, Ryan McDonald

Figure 1 for Planning with Entity Chains for Abstractive Summarization

Figure 2 for Planning with Entity Chains for Abstractive Summarization

Figure 3 for Planning with Entity Chains for Abstractive Summarization

Figure 4 for Planning with Entity Chains for Abstractive Summarization

Abstract:Pre-trained transformer-based sequence-to-sequence models have become the go-to solution for many text generation tasks, including summarization. However, the results produced by these models tend to contain significant issues such as hallucinations and irrelevant passages. One solution to mitigate these problems is to incorporate better content planning in neural summarization. We propose to use entity chains (i.e., chains of entities mentioned in the summary) to better plan and ground the generation of abstractive summaries. In particular, we augment the target by prepending it with its entity chain. We experimented with both pre-training and finetuning with this content planning objective. When evaluated on CNN/DailyMail, SAMSum and XSum, models trained with this objective improved on entity correctness and summary conciseness, and achieved state-of-the-art performance on ROUGE for SAMSum and XSum.

* 11 pages

Via

Access Paper or Ask Questions

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Feb 03, 2021

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole(+45 more)

Figure 1 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Figure 2 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Figure 3 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Figure 4 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Abstract:We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. However, due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of corpora and evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the initial release for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

Via

Access Paper or Ask Questions

Stepwise Extractive Summarization and Planning with Structured Transformers

Oct 06, 2020

Shashi Narayan, Joshua Maynez, Jakub Adamek, Daniele Pighin, Blaž Bratanič, Ryan McDonald

Figure 1 for Stepwise Extractive Summarization and Planning with Structured Transformers

Figure 2 for Stepwise Extractive Summarization and Planning with Structured Transformers

Figure 3 for Stepwise Extractive Summarization and Planning with Structured Transformers

Figure 4 for Stepwise Extractive Summarization and Planning with Structured Transformers

Abstract:We propose encoder-centric stepwise models for extractive summarization using structured transformers -- HiBERT and Extended Transformers. We enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure. Our models are not only efficient in modeling the structure of long inputs, but they also do not rely on task-specific redundancy-aware modeling, making them a general purpose extractive content planner for different tasks. When evaluated on CNN/DailyMail extractive summarization, stepwise models achieve state-of-the-art performance in terms of Rouge without any redundancy aware modeling or sentence filtering. This also holds true for Rotowire table-to-text generation, where our models surpass previously reported metrics for content selection, planning and ordering, highlighting the strength of stepwise modeling. Amongst the two structured transformers we test, stepwise Extended Transformers provides the best performance across both datasets and sets a new standard for these challenges.

* 17 pages, EMNLP 2020

Via

Access Paper or Ask Questions

On Faithfulness and Factuality in Abstractive Summarization

May 02, 2020

Joshua Maynez, Shashi Narayan, Bernd Bohnet, Ryan McDonald

Figure 1 for On Faithfulness and Factuality in Abstractive Summarization

Figure 2 for On Faithfulness and Factuality in Abstractive Summarization

Figure 3 for On Faithfulness and Factuality in Abstractive Summarization

Figure 4 for On Faithfulness and Factuality in Abstractive Summarization

Abstract:It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models lead to less human-like responses for open-ended tasks such as language modeling and story generation. In this paper we have analyzed limitations of these models for abstractive document summarization and found that these models are highly prone to hallucinate content that is unfaithful to the input document. We conducted a large scale human evaluation of several neural abstractive summarization systems to better understand the types of hallucinations they produce. Our human annotators found substantial amounts of hallucinated content in all model generated summaries. However, our analysis does show that pretrained models are better summarizers not only in terms of raw metrics, i.e., ROUGE, but also in generating faithful and factual summaries as evaluated by humans. Furthermore, we show that textual entailment measures better correlate with faithfulness than standard metrics, potentially leading the way to automatic evaluation metrics as well as training and decoding criteria.

* ACL 2020, 14 pages

Via

Access Paper or Ask Questions

QURIOUS: Question Generation Pretraining for Text Generation

Apr 23, 2020

Shashi Narayan, Gonçalo Simoes, Ji Ma, Hannah Craighead, Ryan Mcdonald

Figure 1 for QURIOUS: Question Generation Pretraining for Text Generation

Figure 2 for QURIOUS: Question Generation Pretraining for Text Generation

Figure 3 for QURIOUS: Question Generation Pretraining for Text Generation

Figure 4 for QURIOUS: Question Generation Pretraining for Text Generation

Abstract:Recent trends in natural language processing using pretraining have shifted focus towards pretraining and fine-tuning approaches for text generation. Often the focus has been on task-agnostic approaches that generalize the language modeling objective. We propose question generation as a pretraining method, which better aligns with the text generation objectives. Our text generation models pretrained with this method are better at understanding the essence of the input and are better language models for the target task. When evaluated on two text generation tasks, abstractive summarization and answer-focused question generation, our models result in state-of-the-art performances in terms of automatic metrics. Human evaluators also found our summaries and generated questions to be more natural, concise and informative.

* 9 pages

Via

Access Paper or Ask Questions