Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kathleen McKeown

Columbia University

Faithfulness-Aware Decoding Strategies for Abstractive Summarization

Mar 06, 2023

David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, Mohit Bansal

Figure 1 for Faithfulness-Aware Decoding Strategies for Abstractive Summarization

Figure 2 for Faithfulness-Aware Decoding Strategies for Abstractive Summarization

Figure 3 for Faithfulness-Aware Decoding Strategies for Abstractive Summarization

Figure 4 for Faithfulness-Aware Decoding Strategies for Abstractive Summarization

Abstract:Despite significant progress in understanding and improving faithfulness in abstractive summarization, the question of how decoding strategies affect faithfulness is less studied. We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization. We find a consistent trend where beam search with large beam sizes produces the most faithful summaries while nucleus sampling generates the least faithful ones. We propose two faithfulness-aware generation methods to further improve faithfulness over current generation techniques: (1) ranking candidates generated by beam search using automatic faithfulness metrics and (2) incorporating lookahead heuristics that produce a faithfulness score on the future summary. We show that both generation methods significantly improve faithfulness across two datasets as evaluated by four automatic faithfulness metrics and human evaluation. To reduce computational cost, we demonstrate a simple distillation approach that allows the model to generate faithful summaries with just greedy decoding. Our code is publicly available at https://github.com/amazon-science/faithful-summarization-generation

* EACL 2023 (17 pages)

Via

Access Paper or Ask Questions

Detecting Harmful Agendas in News Articles

Jan 31, 2023

Melanie Subbiah, Amrita Bhattacharjee, Bobby Yilun Hua, Tharindu Kumarage, Huan Liu, Kathleen McKeown

Abstract:Manipulated news online is a growing problem which necessitates the use of automated systems to curtail its spread. We argue that while misinformation and disinformation detection have been studied, there has been a lack of investment in the important open challenge of detecting harmful agendas in news articles; identifying harmful agendas is critical to flag news campaigns with the greatest potential for real world harm. Moreover, due to real concerns around censorship, harmful agenda detectors must be interpretable to be effective. In this work, we propose this new task and release a dataset, NewsAgendas, of annotated news articles for agenda identification. We show how interpretable systems can be effective on this task and demonstrate that they can perform comparably to black-box models.

Via

Access Paper or Ask Questions

Benchmarking Large Language Models for News Summarization

Jan 31, 2023

Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto

Abstract:Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability. Second, existing studies have been limited by low-quality references, leading to underestimates of human performance and lower few-shot and finetuning performance. To better evaluate LLMs, we perform human evaluation over high-quality summaries we collect from freelance writers. Despite major stylistic differences such as the amount of paraphrasing, we find that LMM summaries are judged to be on par with human written summaries.

Via

Access Paper or Ask Questions

SWING: Balancing Coverage and Faithfulness for Dialogue Summarization

Jan 25, 2023

Kung-Hsiang Huang, Siffi Singh, Xiaofei Ma, Wei Xiao, Feng Nan, Nicholas Dingwall, William Yang Wang, Kathleen McKeown

Abstract:Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries. To address this issue, we propose to utilize natural language inference (NLI) models to improve coverage while avoiding introducing factual inconsistencies. Specifically, we use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered, as well as to distinguish between factually consistent and inconsistent generated sentences. Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach in balancing coverage and faithfulness, validated with automatic metrics and human evaluations. Additionally, we compute the correlation between commonly used automatic metrics with human judgments in terms of three different dimensions regarding coverage and factual consistency to provide insight into the most suitable metric for evaluating dialogue summaries.

* Accepted by Findings of EACL 2023

Via

Access Paper or Ask Questions

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Dec 20, 2022

Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown

Abstract:Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.

Via

Access Paper or Ask Questions

Legal and Political Stance Detection of SCOTUS Language

Nov 21, 2022

Noah Bergam, Emily Allaway, Kathleen McKeown

Figure 1 for Legal and Political Stance Detection of SCOTUS Language

Figure 2 for Legal and Political Stance Detection of SCOTUS Language

Figure 3 for Legal and Political Stance Detection of SCOTUS Language

Figure 4 for Legal and Political Stance Detection of SCOTUS Language

Abstract:We analyze publicly available US Supreme Court documents using automated stance detection. In the first phase of our work, we investigate the extent to which the Court's public-facing language is political. We propose and calculate two distinct ideology metrics of SCOTUS justices using oral argument transcripts. We then compare these language-based metrics to existing social scientific measures of the ideology of the Supreme Court and the public. Through this cross-disciplinary analysis, we find that justices who are more responsive to public opinion tend to express their ideology during oral arguments. This observation provides a new kind of evidence in favor of the attitudinal change hypothesis of Supreme Court justice behavior. As a natural extension of this political stance detection, we propose the more specialized task of legal stance detection with our new dataset SC-stance, which matches written opinions to legal questions. We find competitive performance on this dataset using language adapters trained on legal documents.

* Natural Legal Language Processing Workshop at EMNLP 2022

Via

Access Paper or Ask Questions

CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

Nov 10, 2022

Divyansh Agarwal, Alexander R. Fabbri, Simeng Han, Wojciech Kryscinski, Faisal Ladhak, Bryan Li, Kathleen McKeown, Dragomir Radev, Tianyi Zhang, Sam Wiseman

Figure 1 for CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

Figure 2 for CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

Figure 3 for CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

Figure 4 for CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

Abstract:This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique challenges and is yet underexplored for text summarization systems. In this shared task, we introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts. We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions. As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total. We discuss the submissions and the baselines for each sub-task in this paper, along with directions for facilitating future work in the field.

* 4 pages + 3 for references and appendix

Via

Access Paper or Ask Questions

Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection

Nov 09, 2022

Hardy Hardy, Miguel Ballesteros, Faisal Ladhak, Muhammad Khalifa, Vittorio Castelli, Kathleen McKeown

Figure 1 for Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection

Figure 2 for Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection

Figure 3 for Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection

Figure 4 for Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection

Abstract:Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed dataset towards negative instances for extractive summarization; we thus adopt a margin ranking loss for extraction to encourage separation between positive and negative examples. Our extraction component operates at the constituent level; our approach to this problem enriches the text with spinal tree information which provides syntactic context (in the form of constituents) to the extraction model. We show an improvement of 3.71 Rouge-1 points over best results reported in prior work on an existing novel chapter dataset.

Via

Access Paper or Ask Questions

SafeText: A Benchmark for Exploring Physical Safety in Language Models

Oct 18, 2022

Sharon Levy, Emily Allaway, Melanie Subbiah, Lydia Chilton, Desmond Patton, Kathleen McKeown, William Yang Wang

Figure 1 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Figure 2 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Figure 3 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Figure 4 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Abstract:Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe. One such type of safety that has been scarcely studied is commonsense physical safety, i.e. text that is not explicitly violent and requires additional commonsense knowledge to comprehend that it leads to physical harm. We create the first benchmark dataset, SafeText, comprising real-life scenarios with paired safe and physically unsafe pieces of advice. We utilize SafeText to empirically study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks. We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice. As a result, we argue for further studies of safety and the assessment of commonsense physical safety in models before release.

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

Mitigating Covertly Unsafe Text within Natural Language Systems

Oct 17, 2022

Alex Mei, Anisha Kabir, Sharon Levy, Melanie Subbiah, Emily Allaway, John Judge, Desmond Patton, Bruce Bimber, Kathleen McKeown, William Yang Wang

Figure 1 for Mitigating Covertly Unsafe Text within Natural Language Systems

Figure 2 for Mitigating Covertly Unsafe Text within Natural Language Systems

Figure 3 for Mitigating Covertly Unsafe Text within Natural Language Systems

Figure 4 for Mitigating Covertly Unsafe Text within Natural Language Systems

Abstract:An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text. Then, we further break down this category with respect to the system's information and discuss solutions to mitigate the generation of text in each of these subcategories. Ultimately, our work defines the problem of covertly unsafe language that causes physical harm and argues that this subtle yet dangerous issue needs to be prioritized by stakeholders and regulators. We highlight mitigation strategies to inspire future researchers to tackle this challenging problem and help improve safety within smart systems.

* To Appear In Findings of the 2022 Conference on Empirical Methods in Natural Language Processing

Via

Access Paper or Ask Questions