Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Event Cross Document Coreference Resolution

Link Prediction for Event Logs in the Process Industry

Aug 12, 2025

Anastasia Zhukova, Thomas Walton, Christian E. Matt, Bela Gipp

Abstract:Knowledge management (KM) is vital in the process industry for optimizing operations, ensuring safety, and enabling continuous improvement through effective use of operational data and past insights. A key challenge in this domain is the fragmented nature of event logs in shift books, where related records, e.g., entries documenting issues related to equipment or processes and the corresponding solutions, may remain disconnected. This fragmentation hinders the recommendation of previous solutions to the users. To address this problem, we investigate record linking (RL) as link prediction, commonly studied in graph-based machine learning, by framing it as a cross-document coreference resolution (CDCR) task enhanced with natural language inference (NLI) and semantic text similarity (STS) by shifting it into the causal inference (CI). We adapt CDCR, traditionally applied in the news domain, into an RL model to operate at the passage level, similar to NLI and STS, while accommodating the process industry's specific text formats, which contain unstructured text and structured record attributes. Our RL model outperformed the best versions of NLI- and STS-driven baselines by 28% (11.43 points) and 27% (11.21 points), respectively. Our work demonstrates how domain adaptation of the state-of-the-art CDCR models, enhanced with reasoning capabilities, can be effectively tailored to the process industry, improving data quality and connectivity in shift logs.

Via

Access Paper or Ask Questions

Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Jun 23, 2024

Qiang Gao, Bobo Li, Zixiang Meng, Yunlong Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

Figure 1 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Figure 2 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Figure 3 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Figure 4 for Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

Abstract:Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.

* LREC|COLING,Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation,2024,5907-5921

Via

Access Paper or Ask Questions

Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models

Jun 04, 2024

Qingkai Min, Qipeng Guo, Xiangkun Hu, Songfang Huang, Zheng Zhang, Yue Zhang

Abstract:Cross-document event coreference resolution (CDECR) involves clustering event mentions across multiple documents that refer to the same real-world events. Existing approaches utilize fine-tuning of small language models (SLMs) like BERT to address the compatibility among the contexts of event mentions. However, due to the complexity and diversity of contexts, these models are prone to learning simple co-occurrences. Recently, large language models (LLMs) like ChatGPT have demonstrated impressive contextual understanding, yet they encounter challenges in adapting to specific information extraction (IE) tasks. In this paper, we propose a collaborative approach for CDECR, leveraging the capabilities of both a universally capable LLM and a task-specific SLM. The collaborative strategy begins with the LLM accurately and comprehensively summarizing events through prompting. Then, the SLM refines its learning of event representations based on these insights during fine-tuning. Experimental results demonstrate that our approach surpasses the performance of both the large and small language models individually, forming a complementary advantage. Across various datasets, our approach achieves state-of-the-art performance, underscoring its effectiveness in diverse scenarios.

* Accepted to ACL-24 Main

Via

Access Paper or Ask Questions

Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Apr 13, 2024

Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Baker, Rahul Ghosh, James H. Martin, Nathaniel Blanchard, Nikhil Krishnaswamy

Figure 1 for Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Figure 2 for Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Figure 3 for Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Figure 4 for Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Abstract:Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear mapping method without finetuning and 3) an ensembling approach based on splitting mention pairs by semantic and discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and AIDA Phase 1. Our ensemble systems using cross-modal linear mapping establish an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing assumptions used, and establish a novel baseline on AIDA Phase 1. Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems, and highlight a need for more multimodal resources in the coreference resolution space.

* To appear at LREC-COLING 2024

Via

Access Paper or Ask Questions

A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Apr 02, 2024

Bowen Ding, Qingkai Min, Shengkun Ma, Yingjie Li, Linyi Yang, Yue Zhang

Figure 1 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Figure 2 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Figure 3 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Figure 4 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Abstract:Based on Pre-trained Language Models (PLMs), event coreference resolution (ECR) systems have demonstrated outstanding performance in clustering coreferential events across documents. However, the existing system exhibits an excessive reliance on the `triggers lexical matching' spurious pattern in the input mention pair text. We formalize the decision-making process of the baseline ECR system using a Structural Causal Model (SCM), aiming to identify spurious and causal associations (i.e., rationales) within the ECR task. Leveraging the debiasing capability of counterfactual data augmentation, we develop a rationale-centric counterfactual data augmentation method with LLM-in-the-loop. This method is specialized for pairwise input in the ECR system, where we conduct direct interventions on triggers and context to mitigate the spurious association while emphasizing the causation. Our approach achieves state-of-the-art performance on three popular cross-document ECR benchmarks and demonstrates robustness in out-of-domain scenarios.

* Accepted to NAACL-24 Main

Via

Access Paper or Ask Questions

Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

Jun 23, 2024

Qiang Gao, Zixiang Meng, Bobo Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

Figure 1 for Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

Figure 2 for Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

Figure 3 for Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

Figure 4 for Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

Abstract:Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events. We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events, where over 70% of them are cross-document. To build a benchmark, we propose a CDEE pipeline that includes 5 steps, namely event extraction, coreference resolution, entity normalization, role normalization and entity-role resolution. Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task. Our work builds a new line of information extraction research and will attract new research attention.

* ACL2024(Findings)

Via

Access Paper or Ask Questions

Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

Apr 04, 2024

Abhijnan Nath, Shadi Manafi, Avyakta Chelle, Nikhil Krishnaswamy

Figure 1 for Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

Figure 2 for Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

Figure 3 for Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

Figure 4 for Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge Distillation

Abstract:In NLP, Event Coreference Resolution (ECR) is the task of connecting event clusters that refer to the same underlying real-life event, usually via neural systems. In this work, we investigate using abductive free-text rationales (FTRs) generated by modern autoregressive LLMs as distant supervision of smaller student models for cross-document coreference (CDCR) of events. We implement novel rationale-oriented event clustering and knowledge distillation methods for event coreference scoring that leverage enriched information from the FTRs for improved CDCR without additional annotation or expensive document clustering. Our model using coreference specific knowledge distillation achieves SOTA B3 F1 on the ECB+ and GVC corpora and we establish a new baseline on the AIDA Phase 1 corpus. Our code can be found at https://github.com/csu-signal/llama_cdcr

* To be published in NAACL 2024 Main

Via

Access Paper or Ask Questions

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

Nov 19, 2023

Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan

Abstract:Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements.

* EMNLP 2023

Via

Access Paper or Ask Questions

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

Jun 06, 2023

Shafiuddin Rehan Ahmed, Abhijnan Nath, Michael Regan, Adam Pollins, Nikhil Krishnaswamy, James H. Martin

Abstract:Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating the annotation process and then, using a novel annotator-centric Recall-Annotation effort trade-off metric, we compare the results of various underlying models and datasets. We finally present a method for obtaining 97\% recall while substantially reducing the workload required by a fully manual annotation process. Code and data can be found at https://github.com/ahmeshaf/model_in_coref

* The 17th Liguistics Annotation Workshop, 2023 (LAW-XVII) short paper. 10 pages, 6 figures, 1 table

Via

Access Paper or Ask Questions

Contrastive Representation Learning for Cross-Document Coreference Resolution of Events and Entities

May 23, 2022

Benjamin Hsu, Graham Horwood

Figure 1 for Contrastive Representation Learning for Cross-Document Coreference Resolution of Events and Entities

Figure 2 for Contrastive Representation Learning for Cross-Document Coreference Resolution of Events and Entities

Figure 3 for Contrastive Representation Learning for Cross-Document Coreference Resolution of Events and Entities

Figure 4 for Contrastive Representation Learning for Cross-Document Coreference Resolution of Events and Entities

Abstract:Identifying related entities and events within and across documents is fundamental to natural language understanding. We present an approach to entity and event coreference resolution utilizing contrastive representation learning. Earlier state-of-the-art methods have formulated this problem as a binary classification problem and leveraged large transformers in a cross-encoder architecture to achieve their results. For large collections of documents and corresponding set of $n$ mentions, the necessity of performing $n^{2}$ transformer computations in these earlier approaches can be computationally intensive. We show that it is possible to reduce this burden by applying contrastive learning techniques that only require $n$ transformer computations at inference time. Our method achieves state-of-the-art results on a number of key metrics on the ECB+ corpus and is competitive on others.

* NAACL 2022

Via

Access Paper or Ask Questions

Topic:Event Cross Document Coreference Resolution

Papers and Code