Alert button
Picture for Ido Dagan

Ido Dagan

Alert button

SummHelper: Collaborative Human-Computer Summarization

Aug 16, 2023
Aviv Slobodkin, Niv Nachum, Shmuel Amar, Ori Shapira, Ido Dagan

Figure 1 for SummHelper: Collaborative Human-Computer Summarization
Figure 2 for SummHelper: Collaborative Human-Computer Summarization
Figure 3 for SummHelper: Collaborative Human-Computer Summarization
Figure 4 for SummHelper: Collaborative Human-Computer Summarization

Current approaches for text summarization are predominantly automatic, with rather limited space for human intervention and control over the process. In this paper, we introduce SummHelper, a 2-phase summarization assistant designed to foster human-machine collaboration. The initial phase involves content selection, where the system recommends potential content, allowing users to accept, modify, or introduce additional selections. The subsequent phase, content consolidation, involves SummHelper generating a coherent summary from these selections, which users can then refine using visual mappings between the summary and the source text. Small-scale user studies reveal the effectiveness of our application, with participants being especially appreciative of the balance between automated guidance and opportunities for personal input.

* Demo paper 
Viaarxiv icon

Revisiting Sentence Union Generation as a Testbed for Text Consolidation

May 24, 2023
Eran Hirsch, Valentina Pyatkin, Ruben Wolhandler, Avi Caciularu, Asi Shefer, Ido Dagan

Figure 1 for Revisiting Sentence Union Generation as a Testbed for Text Consolidation
Figure 2 for Revisiting Sentence Union Generation as a Testbed for Text Consolidation
Figure 3 for Revisiting Sentence Union Generation as a Testbed for Text Consolidation
Figure 4 for Revisiting Sentence Union Generation as a Testbed for Text Consolidation

Tasks involving text generation based on multiple input texts, such as multi-document summarization, long-form question answering and contemporary dialogue applications, challenge models for their ability to properly consolidate partly-overlapping multi-text information. However, these tasks entangle the consolidation phase with the often subjective and ill-defined content selection requirement, impeding proper assessment of models' consolidation capabilities. In this paper, we suggest revisiting the sentence union generation task as an effective well-defined testbed for assessing text consolidation capabilities, decoupling the consolidation challenge from subjective content selection. To support research on this task, we present refined annotation methodology and tools for crowdsourcing sentence union, create the largest union dataset to date and provide an analysis of its rich coverage of various consolidation aspects. We then propose a comprehensive evaluation protocol for union generation, including both human and automatic evaluation. Finally, as baselines, we evaluate state-of-the-art language models on the task, along with a detailed analysis of their capacity to address multi-text consolidation challenges and their limitations.

* Findings of the Association for Computational Linguistics (ACL 2023) 
Viaarxiv icon

Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

May 24, 2023
Avi Caciularu, Matthew E. Peters, Jacob Goldberger, Ido Dagan, Arman Cohan

Figure 1 for Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering
Figure 2 for Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering
Figure 3 for Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering
Figure 4 for Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systematically generate semantically-oriented questions from a salient sentence in one document and challenge the model, during pre-training, to answer these questions while "peeking" into other topically-related documents. In a similar manner, the model is also challenged to recover the sentence from which the question was generated, again while leveraging cross-document information. This novel multi-document QA formulation directs the model to better recover cross-text informational relations, and introduces a natural augmentation that artificially increases the pre-training data. Further, unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation (e.g., QA) and long text generation (e.g., summarization). Following this scheme, we pre-train our model -- termed QAmden -- and evaluate its performance across several multi-document tasks, including multi-document QA, summarization, and query-focused summarization, yielding improvements of up to 7%, and significantly outperforms zero-shot GPT-3.5 and GPT-4.

* Accepted at ACL 2023; camera-ready version 
Viaarxiv icon

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

Apr 03, 2023
Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg

Figure 1 for Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design
Figure 2 for Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design
Figure 3 for Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design
Figure 4 for Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.

* Accepted to TACL, pre-MIT Press publication version 
Viaarxiv icon

Controlled Text Reduction

Oct 24, 2022
Aviv Slobodkin, Paul Roit, Eran Hirsch, Ori Ernst, Ido Dagan

Figure 1 for Controlled Text Reduction
Figure 2 for Controlled Text Reduction
Figure 3 for Controlled Text Reduction
Figure 4 for Controlled Text Reduction

Producing a reduced version of a source text, as in generic or focused summarization, inherently involves two distinct subtasks: deciding on targeted content and generating a coherent text conveying it. While some popular approaches address summarization as a single end-to-end task, prominent works support decomposed modeling for individual subtasks. Further, semi-automated text reduction is also very appealing, where users may identify targeted content while models would generate a corresponding coherent summary. In this paper, we focus on the second subtask, of generating coherent text given pre-selected content. Concretely, we formalize \textit{Controlled Text Reduction} as a standalone task, whose input is a source text with marked spans of targeted content ("highlighting"). A model then needs to generate a coherent text that includes all and only the target information. We advocate the potential of such models, both for modular fully-automatic summarization, as well as for semi-automated human-in-the-loop use cases. Facilitating proper research, we crowdsource high-quality dev and test datasets for the task. Further, we automatically generate a larger "silver" training dataset from available summarization benchmarks, leveraging a pretrained summary-source alignment model. Finally, employing these datasets, we present a supervised baseline model, showing promising results and insightful analyses.

* Accepted to EMNLP 2022 
Viaarxiv icon

How "Multi" is Multi-Document Summarization?

Oct 23, 2022
Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan

Figure 1 for How "Multi" is Multi-Document Summarization?
Figure 2 for How "Multi" is Multi-Document Summarization?
Figure 3 for How "Multi" is Multi-Document Summarization?
Figure 4 for How "Multi" is Multi-Document Summarization?

The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for quantifying and assessing this expectation. To that end, we propose an automated measure for evaluating the degree to which a summary is ``disperse'', in the sense of the number of source documents needed to cover its content. We apply our measure to empirically analyze several popular MDS datasets, with respect to their reference summaries, as well as the output of state-of-the-art systems. Our results show that certain MDS datasets barely require combining information from multiple documents, where a single document often covers the full summary content. Overall, we advocate using our metric for assessing and improving the degree to which summarization datasets require combining multi-document information, and similarly how summarization models actually meet this challenge. Our code is available in https://github.com/ariecattan/multi_mds.

* EMNLP 2022 
Viaarxiv icon

Cross-document Event Coreference Search: Task, Dataset and Modeling

Oct 23, 2022
Alon Eirew, Avi Caciularu, Ido Dagan

Figure 1 for Cross-document Event Coreference Search: Task, Dataset and Modeling
Figure 2 for Cross-document Event Coreference Search: Task, Dataset and Modeling
Figure 3 for Cross-document Event Coreference Search: Task, Dataset and Modeling
Figure 4 for Cross-document Event Coreference Search: Task, Dataset and Modeling

The task of Cross-document Coreference Resolution has been traditionally formulated as requiring to identify all coreference links across a given set of documents. We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search, focusing in this paper on event coreference. Concretely, given a mention in context of an event of interest, considered as a query, the task is to find all coreferring mentions for the query event in a large document collection. To support research on this task, we create a corresponding dataset, which is derived from Wikipedia while leveraging annotations in the available Wikipedia Event Coreference dataset (WEC-Eng). Observing that the coreference search setup is largely analogous to the setting of Open Domain Question Answering, we adapt the prominent Deep Passage Retrieval (DPR) model to our setting, as an appealing baseline. Finally, we present a novel model that integrates a powerful coreference scoring scheme into the DPR architecture, yielding improved performance.

* EMNLP 2022 
Viaarxiv icon

QASem Parsing: Text-to-text Modeling of QA-based Semantics

May 23, 2022
Ayal Klein, Eran Hirsch, Ron Eliav, Valentina Pyatkin, Avi Caciularu, Ido Dagan

Figure 1 for QASem Parsing: Text-to-text Modeling of QA-based Semantics
Figure 2 for QASem Parsing: Text-to-text Modeling of QA-based Semantics
Figure 3 for QASem Parsing: Text-to-text Modeling of QA-based Semantics
Figure 4 for QASem Parsing: Text-to-text Modeling of QA-based Semantics

Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive representation of textual information. To promote this goal, we investigate how to best utilize the power of sequence-to-sequence (seq2seq) pre-trained language models, within the unique setup of semi-structured outputs, consisting of an unordered set of question-answer pairs. We examine different input and output linearization strategies, and assess the effect of multitask learning and of simple data augmentation techniques in the setting of imbalanced training data. Consequently, we release the first unified QASem parsing tool, practical for downstream applications who can benefit from an explicit, QA-based account of information units in a text.

Viaarxiv icon

Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering

Dec 16, 2021
Avi Caciularu, Ido Dagan, Jacob Goldberger, Arman Cohan

Figure 1 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering
Figure 2 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering
Figure 3 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering
Figure 4 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering

Long-range transformer models have achieved encouraging results on long-context question answering (QA) tasks. Such tasks often require reasoning over a long document, and they benefit from identifying a set of evidence spans (e.g., sentences) that provide supporting evidence for addressing the question. In this work, we propose a novel method for equipping long-range transformers with an additional sequence-level objective for better identification of supporting evidence spans. We achieve this by proposing an additional contrastive supervision signal in finetuning, where the model is encouraged to explicitly discriminate supporting evidence sentences from negative ones by maximizing the question-evidence similarity. The proposed additional loss exhibits consistent improvements on three different strong long-context transformer models, across two challenging question answering benchmarks - HotpotQA and QAsper.

Viaarxiv icon

A Proposition-Level Clustering Approach for Multi-Document Summarization

Dec 16, 2021
Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan

Figure 1 for A Proposition-Level Clustering Approach for Multi-Document Summarization
Figure 2 for A Proposition-Level Clustering Approach for Multi-Document Summarization
Figure 3 for A Proposition-Level Clustering Approach for Multi-Document Summarization
Figure 4 for A Proposition-Level Clustering Approach for Multi-Document Summarization

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Clusters were leveraged to indicate information saliency and to avoid redundancy. These methods focused on clustering sentences, even though closely related sentences also usually contain non-aligning information. In this work, we revisit the clustering approach, grouping together propositions for more precise information alignment. Specifically, our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference.

Viaarxiv icon