Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiacheng Xu

Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

May 25, 2022
Liyan Tang, Tanya Goyal, Alexander R. Fabbri, Philippe Laban, Jiacheng Xu, Semih Yahvuz, Wojciech Kryściński, Justin F. Rousseau, Greg Durrett

Figure 1 for Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

Figure 2 for Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

Figure 3 for Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

Figure 4 for Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

The propensity of abstractive summarization systems to make factual errors has been the subject of significant study, including work on models to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, error detectors, and annotated benchmarks make factuality evaluation a moving target; it is hard to get a clear picture of how techniques compare. In this work, we collect labeled factuality errors from across nine datasets of annotated summary outputs and stratify them in a new way, focusing on what kind of base summarization model was used. To support finer-grained analysis, we unify the labeled error types into a single taxonomy and project each of the datasets' errors into this shared labeled space. We then contrast five state-of-the-art error detection methods on this benchmark. Our findings show that benchmarks built on modern summary outputs (those from pre-trained models) show significantly different results than benchmarks using pre-Transformer models. Furthermore, no one factuality technique is superior in all settings or for all error types, suggesting that system developers should take care to choose the right system for their task at hand.

* 11 pages (15 with references and appendix), 4 figures, 8 Tables

Via

Access Paper or Ask Questions

Massive-scale Decoding for Text Generation using Lattices

Dec 14, 2021
Jiacheng Xu, Greg Durrett

Figure 1 for Massive-scale Decoding for Text Generation using Lattices

Figure 2 for Massive-scale Decoding for Text Generation using Lattices

Figure 3 for Massive-scale Decoding for Text Generation using Lattices

Figure 4 for Massive-scale Decoding for Text Generation using Lattices

Neural text generation models like those used for summarization and translation generate high-quality outputs, but often concentrate around a mode when what we really want is a diverse set of options. We present a search algorithm to construct lattices encoding a massive number of generation options. First, we restructure decoding as a best-first search, which explores the space differently than beam search and improves efficiency by avoiding pruning paths. Second, we revisit the idea of hypothesis recombination: we can identify pairs of similar generation candidates during search and merge them as an approximation. On both document summarization and machine translation, we show that our algorithm encodes hundreds to thousands of diverse options that remain grammatical and high-quality into one linear-sized lattice. This algorithm provides a foundation for building downstream generation applications on top of massive-scale diverse outputs.

* 19 pages, 13 figures, see https://github.com/jiacheng-xu/lattice-generation

Via

Access Paper or Ask Questions

Training Dynamics for Text Summarization Models

Oct 15, 2021
Tanya Goyal, Jiacheng Xu, Junyi Jessy Li, Greg Durrett

Figure 1 for Training Dynamics for Text Summarization Models

Figure 2 for Training Dynamics for Text Summarization Models

Figure 3 for Training Dynamics for Text Summarization Models

Figure 4 for Training Dynamics for Text Summarization Models

Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training models or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on news summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that properties such as copy behavior are learnt earlier in the training process and these observations are robust across domains. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, and this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly. This simple training modification allows us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.

* preprint

Via

Access Paper or Ask Questions

Aspect-Oriented Summarization through Query-Focused Extraction

Oct 15, 2021
Ojas Ahuja, Jiacheng Xu, Akshay Gupta, Kevin Horecka, Greg Durrett

Figure 1 for Aspect-Oriented Summarization through Query-Focused Extraction

Figure 2 for Aspect-Oriented Summarization through Query-Focused Extraction

Figure 3 for Aspect-Oriented Summarization through Query-Focused Extraction

Figure 4 for Aspect-Oriented Summarization through Query-Focused Extraction

A reader interested in a particular topic might be interested in summarizing documents on that subject with a particular focus, rather than simply seeing generic summaries produced by most summarization systems. While query-focused summarization has been explored in prior work, this is often approached from the standpoint of document-specific questions or on synthetic data. Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries. In this paper, we collect a dataset of realistic aspect-oriented test cases, AspectNews, which covers different subtopics about articles in news sub-domains. We then investigate how query-focused methods, for which we can construct synthetic data, can handle this aspect-oriented setting: we benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model. We evaluate on two aspect-oriented datasets and find this approach yields (a) focused summaries, better than those from a generic summarization system, which go beyond simple keyword matching; (b) a system sensitive to the choice of keywords.

Via

Access Paper or Ask Questions

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Jun 08, 2021
Aditya Gupta, Jiacheng Xu, Shyam Upadhyay, Diyi Yang, Manaal Faruqui

Figure 1 for Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Figure 2 for Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Figure 3 for Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Figure 4 for Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on Disfl-QA in a zero-shot setting.We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: https://github.com/google-research-datasets/disfl-qa.

* Findings of ACL 2021

Via

Access Paper or Ask Questions

Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution

Jun 03, 2021
Jiacheng Xu, Greg Durrett

Figure 1 for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution

Figure 2 for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution

Figure 3 for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution

Figure 4 for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution

Despite the prominence of neural abstractive summarization models, we know little about how they actually form summaries and how to understand where their decisions come from. We propose a two-step method to interpret summarization model decisions. We first analyze the model's behavior by ablating the full model to categorize each decoder decision into one of several generation modes: roughly, is the model behaving like a language model, is it relying heavily on the input, or is it somewhere in between? After isolating decisions that do depend on the input, we explore interpreting these decisions using several different attribution methods. We compare these techniques based on their ability to select content and reconstruct the model's predicted token from perturbations of the input, thus revealing whether highlighted attributions are truly important for the generation of the next token. While this machinery can be broadly useful even beyond summarization, we specifically demonstrate its capability to identify phrases the summarization model has memorized and determine where in the training pipeline this memorization happened, as well as study complex generation phenomena like sentence fusion on a per-instance basis.

* ACL 2021; 16 pages

Via

Access Paper or Ask Questions

Compressive Summarization with Plausibility and Salience Modeling

Oct 15, 2020
Shrey Desai, Jiacheng Xu, Greg Durrett

Figure 1 for Compressive Summarization with Plausibility and Salience Modeling

Figure 2 for Compressive Summarization with Plausibility and Salience Modeling

Figure 3 for Compressive Summarization with Plausibility and Salience Modeling

Figure 4 for Compressive Summarization with Plausibility and Salience Modeling

Compressive summarization systems typically rely on a crafted set of syntactic rules to determine what spans of possible summary sentences can be deleted, then learn a model of what to actually delete by optimizing for content selection (ROUGE). In this work, we propose to relax the rigid syntactic constraints on candidate spans and instead leave compression decisions to two data-driven criteria: plausibility and salience. Deleting a span is plausible if removing it maintains the grammaticality and factuality of a sentence, and spans are salient if they contain important information from the summary. Each of these is judged by a pre-trained Transformer model, and only deletions that are both plausible and not salient can be applied. When integrated into a simple extraction-compression pipeline, our method achieves strong in-domain results on benchmark summarization datasets, and human evaluation shows that the plausibility model generally selects for grammatical and factual deletions. Furthermore, the flexibility of our approach allows it to generalize cross-domain: our system fine-tuned on only 500 samples from a new domain can match or exceed an in-domain extractive model trained on much more data.

* Accepted to EMNLP 2020

Via

Access Paper or Ask Questions

Understanding Neural Abstractive Summarization Models via Uncertainty

Oct 15, 2020
Jiacheng Xu, Shrey Desai, Greg Durrett

Figure 1 for Understanding Neural Abstractive Summarization Models via Uncertainty

Figure 2 for Understanding Neural Abstractive Summarization Models via Uncertainty

Figure 3 for Understanding Neural Abstractive Summarization Models via Uncertainty

Figure 4 for Understanding Neural Abstractive Summarization Models via Uncertainty

An advantage of seq2seq abstractive summarization models is that they generate text in a free-form manner, but this flexibility makes it difficult to interpret model behavior. In this work, we analyze summarization decoders in both blackbox and whitebox ways by studying on the entropy, or uncertainty, of the model's token-level predictions. For two strong pre-trained models, PEGASUS and BART on two summarization datasets, we find a strong correlation between low prediction entropy and where the model copies tokens rather than generating novel text. The decoder's uncertainty also connects to factors like sentence position and syntactic distance between adjacent pairs of tokens, giving a sense of what factors make a context particularly selective for the model's next output token. Finally, we study the relationship of decoder uncertainty and attention behavior to understand how attention gives rise to these observed effects in the model. We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.

* To appear in EMNLP 2020; code available at https://github.com/jiacheng-xu/text-sum-uncertainty

Via

Access Paper or Ask Questions

Discourse-Aware Neural Extractive Model for Text Summarization

Oct 30, 2019
Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu

Figure 1 for Discourse-Aware Neural Extractive Model for Text Summarization

Figure 2 for Discourse-Aware Neural Extractive Model for Text Summarization

Figure 3 for Discourse-Aware Neural Extractive Model for Text Summarization

Figure 4 for Discourse-Aware Neural Extractive Model for Text Summarization

Recently BERT has been adopted in state-of-the-art text summarization models for document encoding. However, such BERT-based extractive models use the sentence as the minimal selection unit, which often results in redundant or uninformative phrases in the generated summaries. As BERT is pre-trained on sentence pairs, not documents, the long-range dependencies between sentences are not well captured. To address these issues, we present a graph-based discourse-aware neural summarization model - DiscoBert. By utilizing discourse segmentation to extract discourse units (instead of sentences) as candidates, DiscoBert provides a fine-grained granularity for extractive selection, which helps reduce redundancy in extracted summaries. Based on this, two discourse graphs are further proposed: ($i$) RST Graph based on RST discourse trees; and ($ii$) Coreference Graph based on coreference mentions in the document. DiscoBert first encodes the extracted discourse units with BERT, and then uses a graph convolutional network to capture the long-range dependencies among discourse units through the constructed graphs. Experimental results on two popular summarization datasets demonstrate that DiscoBert outperforms state-of-the-art methods by a significant margin.

Via

Access Paper or Ask Questions

Neural Extractive Text Summarization with Syntactic Compression

Feb 03, 2019
Jiacheng Xu, Greg Durrett

Figure 1 for Neural Extractive Text Summarization with Syntactic Compression

Figure 2 for Neural Extractive Text Summarization with Syntactic Compression

Figure 3 for Neural Extractive Text Summarization with Syntactic Compression

Figure 4 for Neural Extractive Text Summarization with Syntactic Compression

Recent neural network approaches to summarization are largely either sentence-extractive, choosing a set of sentences as the summary, or abstractive, generating the summary from a seq2seq model. In this work, we present a neural model for single-document summarization based on joint extraction and compression. Following recent successful extractive models, we frame the summarization problem as a series of local decisions. Our model chooses sentences from the document and then decides which of a set of compression options to apply to each selected sentence. We compute this set of options using discrete compression rules based on syntactic constituency parses; however, our approach is modular and can flexibly use any available source of compressions. For learning, we construct oracle extractive-compressive summaries that reflect uncertainty over our model's decision sequence, then learn both of our components jointly with this supervision. Experimental results on the CNN/Daily Mail and New York Times datasets show that our model achieves the state-of-the-art performance on content selection evaluated by ROUGE. Moreover, human and manual evaluation show that our model's output generally remains grammatical.

Via

Access Paper or Ask Questions