Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ido Dagan

Bar-Ilan University

How "Multi" is Multi-Document Summarization?

Oct 23, 2022

Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan

Figure 1 for How "Multi" is Multi-Document Summarization?

Figure 2 for How "Multi" is Multi-Document Summarization?

Figure 3 for How "Multi" is Multi-Document Summarization?

Figure 4 for How "Multi" is Multi-Document Summarization?

Abstract:The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for quantifying and assessing this expectation. To that end, we propose an automated measure for evaluating the degree to which a summary is ``disperse'', in the sense of the number of source documents needed to cover its content. We apply our measure to empirically analyze several popular MDS datasets, with respect to their reference summaries, as well as the output of state-of-the-art systems. Our results show that certain MDS datasets barely require combining information from multiple documents, where a single document often covers the full summary content. Overall, we advocate using our metric for assessing and improving the degree to which summarization datasets require combining multi-document information, and similarly how summarization models actually meet this challenge. Our code is available in https://github.com/ariecattan/multi_mds.

* EMNLP 2022

Via

Access Paper or Ask Questions

Cross-document Event Coreference Search: Task, Dataset and Modeling

Oct 23, 2022

Alon Eirew, Avi Caciularu, Ido Dagan

Abstract:The task of Cross-document Coreference Resolution has been traditionally formulated as requiring to identify all coreference links across a given set of documents. We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search, focusing in this paper on event coreference. Concretely, given a mention in context of an event of interest, considered as a query, the task is to find all coreferring mentions for the query event in a large document collection. To support research on this task, we create a corresponding dataset, which is derived from Wikipedia while leveraging annotations in the available Wikipedia Event Coreference dataset (WEC-Eng). Observing that the coreference search setup is largely analogous to the setting of Open Domain Question Answering, we adapt the prominent Deep Passage Retrieval (DPR) model to our setting, as an appealing baseline. Finally, we present a novel model that integrates a powerful coreference scoring scheme into the DPR architecture, yielding improved performance.

* EMNLP 2022

Via

Access Paper or Ask Questions

QASem Parsing: Text-to-text Modeling of QA-based Semantics

May 23, 2022

Ayal Klein, Eran Hirsch, Ron Eliav, Valentina Pyatkin, Avi Caciularu, Ido Dagan

Figure 1 for QASem Parsing: Text-to-text Modeling of QA-based Semantics

Figure 2 for QASem Parsing: Text-to-text Modeling of QA-based Semantics

Figure 3 for QASem Parsing: Text-to-text Modeling of QA-based Semantics

Figure 4 for QASem Parsing: Text-to-text Modeling of QA-based Semantics

Abstract:Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive representation of textual information. To promote this goal, we investigate how to best utilize the power of sequence-to-sequence (seq2seq) pre-trained language models, within the unique setup of semi-structured outputs, consisting of an unordered set of question-answer pairs. We examine different input and output linearization strategies, and assess the effect of multitask learning and of simple data augmentation techniques in the setting of imbalanced training data. Consequently, we release the first unified QASem parsing tool, practical for downstream applications who can benefit from an explicit, QA-based account of information units in a text.

Via

Access Paper or Ask Questions

Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering

Dec 16, 2021

Avi Caciularu, Ido Dagan, Jacob Goldberger, Arman Cohan

Figure 1 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering

Figure 2 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering

Figure 3 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering

Figure 4 for Utilizing Evidence Spans via Sequence-Level Contrastive Learning for Long-Context Question Answering

Abstract:Long-range transformer models have achieved encouraging results on long-context question answering (QA) tasks. Such tasks often require reasoning over a long document, and they benefit from identifying a set of evidence spans (e.g., sentences) that provide supporting evidence for addressing the question. In this work, we propose a novel method for equipping long-range transformers with an additional sequence-level objective for better identification of supporting evidence spans. We achieve this by proposing an additional contrastive supervision signal in finetuning, where the model is encouraged to explicitly discriminate supporting evidence sentences from negative ones by maximizing the question-evidence similarity. The proposed additional loss exhibits consistent improvements on three different strong long-context transformer models, across two challenging question answering benchmarks - HotpotQA and QAsper.

Via

Access Paper or Ask Questions

A Proposition-Level Clustering Approach for Multi-Document Summarization

Dec 16, 2021

Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan

Figure 1 for A Proposition-Level Clustering Approach for Multi-Document Summarization

Figure 2 for A Proposition-Level Clustering Approach for Multi-Document Summarization

Figure 3 for A Proposition-Level Clustering Approach for Multi-Document Summarization

Figure 4 for A Proposition-Level Clustering Approach for Multi-Document Summarization

Abstract:Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Clusters were leveraged to indicate information saliency and to avoid redundancy. These methods focused on clustering sentences, even though closely related sentences also usually contain non-aligning information. In this work, we revisit the clustering approach, grouping together propositions for more precise information alignment. Specifically, our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference.

Via

Access Paper or Ask Questions

Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Oct 09, 2021

Daniela Brook Weiss, Paul Roit, Ori Ernst, Ido Dagan

Figure 1 for Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Figure 2 for Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Figure 3 for Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Figure 4 for Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Abstract:NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial to identify salient information across texts and then generate a non-redundant summary, while facing repeated and usually differently-phrased salient content. To facilitate researching such challenges, the sentence-level task of \textit{sentence fusion} was proposed, yet previous datasets for this task were very limited in their size and scope. In this paper, we revisit and substantially extend previous dataset creation efforts. With careful modifications, relabeling and employing complementing data sources, we were able to triple the size of a notable earlier dataset. Moreover, we show that our extended version uses more representative texts for multi-document tasks and provides a larger and more diverse training set, which substantially improves model training.

Via

Access Paper or Ask Questions

Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

Oct 03, 2021

Ori Shapira, Ramakanth Pasunuru, Ido Dagan, Yael Amsterdamer

Figure 1 for Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

Figure 2 for Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

Figure 3 for Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

Figure 4 for Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

Abstract:Keyphrase extraction has been comprehensively researched within the single-document setting, with an abundance of methods and a wealth of datasets. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no dataset existed for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To initiate this pursuit, we present here the first literature review and the first dataset for the task, MK-DUC-01, which can serve as a new benchmark. We test several keyphrase extraction baselines on our data and show their results.

Via

Access Paper or Ask Questions

QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

Sep 26, 2021

Daniela Brook Weiss, Paul Roit, Ayal Klein, Ori Ernst, Ido Dagan

Figure 1 for QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

Figure 2 for QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

Figure 3 for QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

Figure 4 for QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

Abstract:Multi-text applications, such as multi-document summarization, are typically required to model redundancies across related texts. Current methods confronting consolidation struggle to fuse overlapping information. In order to explicitly represent content overlap, we propose to align predicate-argument relations across texts, providing a potential scaffold for information consolidation. We go beyond clustering coreferring mentions, and instead model overlap with respect to redundancy at a propositional level, rather than merely detecting shared referents. Our setting exploits QA-SRL, utilizing question-answer pairs to capture predicate-argument relations, facilitating laymen annotation of cross-text alignments. We employ crowd-workers for constructing a dataset of QA-based alignments, and present a baseline QA alignment model trained over our dataset. Analyses show that our new task is semantically challenging, capturing content overlap beyond lexical similarity and complements cross-document coreference with proposition-level links, offering potential use for downstream tasks.

* Accepted to EMNLP 2021, Main Conference

Via

Access Paper or Ask Questions

iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

Sep 23, 2021

Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Ido Dagan

Figure 1 for iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

Figure 2 for iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

Figure 3 for iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

Figure 4 for iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

Abstract:We introduce iFacetSum, a web application for exploring topical document sets. iFacetSum integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user's selections. This approach offers both a comprehensive overview as well as concise details regarding subtopics of choice. Fine-grained facets are automatically produced based on cross-document coreference pipelines, rendering generic concepts, entities and statements surfacing in the source texts. We analyze the effectiveness of our application through small-scale user studies, which suggest the usefulness of our approach.

* Proceedings of EMNLP 2021, System Demonstrations. 7 pages and an appendix

Via

Access Paper or Ask Questions

Asking It All: Generating Contextualized Questions for any Semantic Role

Sep 10, 2021

Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan

Figure 1 for Asking It All: Generating Contextualized Questions for any Semantic Role

Figure 2 for Asking It All: Generating Contextualized Questions for any Semantic Role

Figure 3 for Asking It All: Generating Contextualized Questions for any Semantic Role

Figure 4 for Asking It All: Generating Contextualized Questions for any Semantic Role

Abstract:Asking questions about a situation is an inherent step towards understanding it. To this end, we introduce the task of role question generation, which, given a predicate mention and a passage, requires producing a set of questions asking about all possible semantic roles of the predicate. We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage. Unlike most existing approaches to question generation, our approach does not require conditioning on existing answers in the text. Instead, we condition on the type of information to inquire about, regardless of whether the answer appears explicitly in the text, could be inferred from it, or should be sought elsewhere. Our evaluation demonstrates that we generate diverse and well-formed questions for a large, broad-coverage ontology of predicates and roles.

* Accepted as a long paper to EMNLP 2021, Main Conference

Via

Access Paper or Ask Questions