Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shengkun Ma

MRCEval: A Comprehensive, Challenging and Accessible Machine Reading Comprehension Benchmark

Mar 10, 2025

Shengkun Ma, Hao Peng, Lei Hou, Juanzi Li

Abstract:Machine Reading Comprehension (MRC) is an essential task in evaluating natural language understanding. Existing MRC datasets primarily assess specific aspects of reading comprehension (RC), lacking a comprehensive MRC benchmark. To fill this gap, we first introduce a novel taxonomy that categorizes the key capabilities required for RC. Based on this taxonomy, we construct MRCEval, an MRC benchmark that leverages advanced Large Language Models (LLMs) as both sample generators and selection judges. MRCEval is a comprehensive, challenging and accessible benchmark designed to assess the RC capabilities of LLMs thoroughly, covering 13 distinct RC skills with a total of 2.1K high-quality multi-choice questions. We perform an extensive evaluation of 28 widely used open-source and proprietary models, highlighting that MRC continues to present significant challenges even in the era of LLMs.

* Under review

Via

Access Paper or Ask Questions

A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Apr 02, 2024

Bowen Ding, Qingkai Min, Shengkun Ma, Yingjie Li, Linyi Yang, Yue Zhang

Figure 1 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Figure 2 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Figure 3 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Figure 4 for A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Abstract:Based on Pre-trained Language Models (PLMs), event coreference resolution (ECR) systems have demonstrated outstanding performance in clustering coreferential events across documents. However, the existing system exhibits an excessive reliance on the `triggers lexical matching' spurious pattern in the input mention pair text. We formalize the decision-making process of the baseline ECR system using a Structural Causal Model (SCM), aiming to identify spurious and causal associations (i.e., rationales) within the ECR task. Leveraging the debiasing capability of counterfactual data augmentation, we develop a rationale-centric counterfactual data augmentation method with LLM-in-the-loop. This method is specialized for pairwise input in the ECR system, where we conduct direct interventions on triggers and context to mitigate the spurious association while emphasizing the causation. Our approach achieves state-of-the-art performance on three popular cross-document ECR benchmarks and demonstrates robustness in out-of-domain scenarios.

* Accepted to NAACL-24 Main

Via

Access Paper or Ask Questions

Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Feb 24, 2024

Shengkun Ma, Jiale Han, Yi Liang, Bo Cheng

Figure 1 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Figure 2 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Figure 3 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Figure 4 for Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

Abstract:Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations while avoiding forgetting old ones with few labeled training data. The primary challenges are catastrophic forgetting and overfitting. This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models to address the above two challenges, thereby making language models better continual few-shot relation extractors. Specifically, we propose a Contrastive Prompt Learning framework, which designs prompt representation to acquire more generalized knowledge that can be easily adapted to old and new categories, and margin-based contrastive learning to focus more on hard samples, therefore alleviating catastrophic forgetting and overfitting issues. To further remedy overfitting in low-resource scenarios, we introduce an effective memory augmentation strategy that employs well-crafted prompts to guide ChatGPT in generating diverse samples. Extensive experiments demonstrate that our method outperforms state-of-the-art methods by a large margin and significantly mitigates catastrophic forgetting and overfitting in low-resource scenarios.

* Accepted as COLING2024

Via

Access Paper or Ask Questions