Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peter J. Liu

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Dec 20, 2022

Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter J. Liu

Figure 1 for Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Figure 2 for Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Figure 3 for Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Figure 4 for Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Abstract:The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical study quantifying the sometimes severe loss in performance (up to 12 ROUGE-1 points) from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points.

Via

Access Paper or Ask Questions

Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Sep 30, 2022

Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, Peter J. Liu

Figure 1 for Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Figure 2 for Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Figure 3 for Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Figure 4 for Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Abstract:Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on OOD inputs as the prediction is done auto-regressively over many steps. Furthermore, the space of potential low-quality outputs is larger as arbitrary text can be generated and it is important to know when to trust the generated output. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation (analogous to selective prediction for classification) of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models.

Via

Access Paper or Ask Questions

Calibrating Sequence likelihood Improves Conditional Language Generation

Sep 30, 2022

Yao Zhao, Misha Khalman, Rishabh Joshi, Shashi Narayan, Mohammad Saleh, Peter J. Liu

Figure 1 for Calibrating Sequence likelihood Improves Conditional Language Generation

Figure 2 for Calibrating Sequence likelihood Improves Conditional Language Generation

Figure 3 for Calibrating Sequence likelihood Improves Conditional Language Generation

Figure 4 for Calibrating Sequence likelihood Improves Conditional Language Generation

Abstract:Conditional language models are predominantly trained with maximum likelihood estimation (MLE), giving probability mass to sparsely observed target sequences. While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. This has been empirically observed in beam search decoding as output quality degrading with large beam sizes, and decoding strategies benefiting from heuristics such as length normalization and repetition-blocking. In this work, we introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space. With SLiC, decoding heuristics become unnecessary and decoding candidates' quality significantly improves regardless of the decoding method. Furthermore, SLiC shows no sign of diminishing returns with model scale, and presents alternative ways to improve quality with limited training and inference budgets. With SLiC, we exceed or match SOTA results on a wide range of generation tasks spanning abstractive summarization, question generation, abstractive question answering and data-to-text generation, even with modest-sized models.

Via

Access Paper or Ask Questions

Investigating Efficiently Extending Transformers for Long Input Summarization

Aug 08, 2022

Jason Phang, Yao Zhao, Peter J. Liu

Figure 1 for Investigating Efficiently Extending Transformers for Long Input Summarization

Figure 2 for Investigating Efficiently Extending Transformers for Long Input Summarization

Figure 3 for Investigating Efficiently Extending Transformers for Long Input Summarization

Figure 4 for Investigating Efficiently Extending Transformers for Long Input Summarization

Abstract:While large pretrained Transformer models have proven highly capable at tackling natural language tasks, handling long sequence inputs continues to be a significant challenge. One such task is long input summarization, where inputs are longer than the maximum input context of most pretrained models. Through an extensive set of experiments, we investigate what model architectural changes and pretraining paradigms can most efficiently adapt a pretrained Transformer for long input summarization. We find that a staggered, block-local Transformer with global encoder tokens strikes a good balance of performance and efficiency, and that an additional pretraining phase on long sequences meaningfully improves downstream summarization performance. Based on our findings, we introduce PEGASUS-X, an extension of the PEGASUS model with additional long input pretraining to handle inputs of up to 16K tokens. PEGASUS-X achieves strong performance on long input summarization tasks comparable with much larger models while adding few additional parameters and not requiring model parallelism to train.

Via

Access Paper or Ask Questions

SMART: Sentences as Basic Units for Text Evaluation

Aug 01, 2022

Reinald Kim Amplayo, Peter J. Liu, Yao Zhao, Shashi Narayan

Figure 1 for SMART: Sentences as Basic Units for Text Evaluation

Figure 2 for SMART: Sentences as Basic Units for Text Evaluation

Figure 3 for SMART: Sentences as Basic Units for Text Evaluation

Figure 4 for SMART: Sentences as Basic Units for Text Evaluation

Abstract:Widely used evaluation metrics for text generation either do not work well with longer texts or fail to evaluate all aspects of text quality. In this paper, we introduce a new metric called SMART to mitigate such limitations. Specifically, We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences. Candidate sentences are also compared to sentences in the source documents to allow grounding (e.g., factuality) evaluation. Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics on the SummEval summarization meta-evaluation dataset, while the same metric with a string-based matching function is competitive with current model-based metrics. The latter does not use any neural model, which is useful during model development phases where resources can be limited and fast evaluation is required. Finally, we also conducted extensive analyses showing that our proposed metrics work well with longer summaries and are less biased towards specific models.

* code coming soon

Via

Access Paper or Ask Questions

SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Jun 18, 2020

Yao Zhao, Mohammad Saleh, Peter J. Liu

Figure 1 for SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Figure 2 for SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Figure 3 for SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Figure 4 for SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Abstract:Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. Using only the original documents and summaries, we derive proxy labels that provide weak supervision for extractive layers simultaneously with regular supervision from abstractive summaries. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text. Since content selection is explicit in the SEAL model, a desirable side effect is that the selection can be inspected for enhanced interpretability.

Via

Access Paper or Ask Questions

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Dec 18, 2019

Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu

Figure 1 for PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Figure 2 for PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Figure 3 for PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Figure 4 for PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Abstract:Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples.

Via

Access Paper or Ask Questions

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Oct 24, 2019

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Figure 1 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Figure 2 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Figure 3 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Figure 4 for Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Abstract:Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.

Via

Access Paper or Ask Questions

SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

Oct 02, 2019

Peter J. Liu, Yu-An Chung, Jie Ren

Figure 1 for SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

Figure 2 for SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

Figure 3 for SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

Figure 4 for SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

Abstract:We propose an end-to-end neural model for zero-shot abstractive text summarization of paragraphs, and introduce a benchmark task, ROCSumm, based on ROCStories, a subset for which we collected human summaries. In this task, five-sentence stories (paragraphs) are summarized with one sentence, using human summaries only for evaluation. We show results for extractive and human baselines to demonstrate a large abstractive gap in performance. Our model, SummAE, consists of a denoising auto-encoder that embeds sentences and paragraphs in a common space, from which either can be decoded. Summaries for paragraphs are generated by decoding a sentence from the paragraph representations. We find that traditional sequence-to-sequence auto-encoders fail to produce good summaries and describe how specific architectural choices and pre-training techniques can significantly improve performance, outperforming extractive baselines. The data, training, evaluation code, and best model weights are open-sourced.

* first two authors contributed equally

Via

Access Paper or Ask Questions

Likelihood Ratios for Out-of-Distribution Detection

Jun 07, 2019

Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, Balaji Lakshminarayanan

Figure 1 for Likelihood Ratios for Out-of-Distribution Detection

Figure 2 for Likelihood Ratios for Out-of-Distribution Detection

Figure 3 for Likelihood Ratios for Out-of-Distribution Detection

Figure 4 for Likelihood Ratios for Out-of-Distribution Detection

Abstract:Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based on genomic sequences, which holds the promise of early detection of diseases, but requires a model that can output low confidence predictions on OOD genomic sequences from new bacteria that were not present in the training data. We introduce a genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem. We investigate deep generative model based approaches for OOD detection and observe that the likelihood score is heavily affected by population level background statistics. We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. We demonstrate the generality of the proposed method by showing that it significantly improves OOD detection when applied to deep generative models of images.

Via

Access Paper or Ask Questions