Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yilun Zhu

GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Nov 01, 2024

Yang Janet Liu, Tatsuya Aoyama, Wesley Scivetti, Yilun Zhu, Shabnam Behzad, Lauren Elizabeth Levine, Jessica Lin, Devika Tiwari, Amir Zeldes

Figure 1 for GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Figure 2 for GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Figure 3 for GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Figure 4 for GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Abstract:Work on shallow discourse parsing in English has focused on the Wall Street Journal corpus, the only large-scale dataset for the language in the PDTB framework. However, the data is not openly available, is restricted to the news domain, and is by now 35 years old. In this paper, we present and evaluate a new open-access, multi-genre benchmark for PDTB-style shallow discourse parsing, based on the existing UD English GUM corpus, for which discourse relation annotations in other frameworks already exist. In a series of experiments on cross-domain relation classification, we show that while our dataset is compatible with PDTB, substantial out-of-domain degradation is observed, which can be alleviated by joint training on both datasets.

* Accepted to EMNLP 2024 (main, long); camera-ready version

Via

Access Paper or Ask Questions

Label Noise: Ignorance Is Bliss

Oct 31, 2024

Yilun Zhu, Jianxin Zhang, Aditya Gangrade, Clayton Scott

Abstract:We establish a new theoretical framework for learning under multi-class, instance-dependent label noise. This framework casts learning with label noise as a form of domain adaptation, in particular, domain adaptation under posterior drift. We introduce the concept of \emph{relative signal strength} (RSS), a pointwise measure that quantifies the transferability from noisy to clean posterior. Using RSS, we establish nearly matching upper and lower bounds on the excess risk. Our theoretical findings support the simple \emph{Noise Ignorant Empirical Risk Minimization (NI-ERM)} principle, which minimizes empirical risk while ignoring label noise. Finally, we translate this theoretical insight into practice: by using NI-ERM to fit a linear classifier on top of a self-supervised feature extractor, we achieve state-of-the-art performance on the CIFAR-N data challenge.

Via

Access Paper or Ask Questions

SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution

Mar 25, 2024

Yilun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

Abstract:Singleton mentions, i.e.~entities mentioned only once in a text, are important to how humans understand discourse from a theoretical perspective. However previous attempts to incorporate their detection in end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention spans in the OntoNotes benchmark. This paper addresses this limitation by combining predicted mentions from existing nested NER systems and features derived from OntoNotes syntax trees. With this approach, we create a near approximation of the OntoNotes dataset with all singleton mentions, achieving ~94% recall on a sample of gold singletons. We then propose a two-step neural mention and coreference resolution system, named SPLICE, and compare its performance to the end-to-end approach in two scenarios: the OntoNotes test set and the out-of-domain (OOD) OntoGUM corpus. Results indicate that reconstructed singleton training yields results comparable to end-to-end systems for OntoNotes, while improving OOD stability (+1.1 avg. F1). We conduct error analysis for mention detection and delve into its impact on coreference clustering, revealing that precision improvements deliver more substantial benefits than increases in recall for resolving coreference chains.

* Accepted to LREC-COLING 2024

Via

Access Paper or Ask Questions

Can Large Language Models Understand Context?

Feb 01, 2024

Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng

Figure 1 for Can Large Language Models Understand Context?

Figure 2 for Can Large Language Models Understand Context?

Figure 3 for Can Large Language Models Understand Context?

Figure 4 for Can Large Language Models Understand Context?

Abstract:Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language Processing, limited attention has been paid to probing their linguistic capability of understanding contextual features. This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models. This benchmark comprises of four distinct tasks and nine datasets, all featuring prompts designed to assess the models' ability to understand context. First, we evaluate the performance of LLMs under the in-context learning pretraining scenario. Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models. Second, as LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings. We find that 3-bit post-training quantization leads to varying degrees of performance reduction on our benchmark. We conduct an extensive analysis of these scenarios to substantiate our experimental results.

* Findings of EACL 2024

Via

Access Paper or Ask Questions

Incorporating Singletons and Mention-based Features in Coreference Resolution via Multi-task Learning for Better Generalization

Sep 20, 2023

Yilun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

Abstract:Previous attempts to incorporate a mention detection step into end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention span data as well as other entity information. This paper presents a coreference model that learns singletons as well as features such as entity type and information status via a multi-task learning-based approach. This approach achieves new state-of-the-art scores on the OntoGUM benchmark (+2.7 points) and increases robustness on multiple out-of-domain datasets (+2.3 points on average), likely due to greater generalizability for mention detection and utilization of more data from singletons when compared to only coreferent mention pair matching.

* IJCNLP-AACL 2023

Via

Access Paper or Ask Questions

Fusing Sparsity with Deep Learning for Rotating Scatter Mask Gamma Imaging

Jul 29, 2023

Yilun Zhu, Clayton Scott, Darren Holland, George Landon, Aaron Fjeldsted, Azaree Lintereur

Abstract:Many nuclear safety applications need fast, portable, and accurate imagers to better locate radiation sources. The Rotating Scatter Mask (RSM) system is an emerging device with the potential to meet these needs. The main challenge is the under-determined nature of the data acquisition process: the dimension of the measured signal is far less than the dimension of the image to be reconstructed. To address this challenge, this work aims to fuse model-based sparsity-promoting regularization and a data-driven deep neural network denoising image prior to perform image reconstruction. An efficient algorithm is developed and produces superior reconstructions relative to current approaches.

Via

Access Paper or Ask Questions

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

Jun 03, 2023

Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, Yilun Zhu, Amir Zeldes

Abstract:We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.

* Camera-ready for LAW-XVII collocated with ACL 2023

Via

Access Paper or Ask Questions

Mixture Proportion Estimation Beyond Irreducibility

Jun 02, 2023

Yilun Zhu, Aaron Fjeldsted, Darren Holland, George Landon, Azaree Lintereur, Clayton Scott

Figure 1 for Mixture Proportion Estimation Beyond Irreducibility

Figure 2 for Mixture Proportion Estimation Beyond Irreducibility

Figure 3 for Mixture Proportion Estimation Beyond Irreducibility

Figure 4 for Mixture Proportion Estimation Beyond Irreducibility

Abstract:The task of mixture proportion estimation (MPE) is to estimate the weight of a component distribution in a mixture, given observations from both the component and mixture. Previous work on MPE adopts the irreducibility assumption, which ensures identifiablity of the mixture proportion. In this paper, we propose a more general sufficient condition that accommodates several settings of interest where irreducibility does not hold. We further present a resampling-based meta-algorithm that takes any existing MPE algorithm designed to work under irreducibility and adapts it to work under our more general condition. Our approach empirically exhibits improved estimation performance relative to baseline methods and to a recently proposed regrouping-based algorithm.

Via

Access Paper or Ask Questions

Findings of the Shared Task on Multilingual Coreference Resolution

Sep 16, 2022

Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, Yilun Zhu

Figure 1 for Findings of the Shared Task on Multilingual Coreference Resolution

Figure 2 for Findings of the Shared Task on Multilingual Coreference Resolution

Figure 3 for Findings of the Shared Task on Multilingual Coreference Resolution

Figure 4 for Findings of the Shared Task on Multilingual Coreference Resolution

Abstract:This paper presents an overview of the shared task on multilingual coreference resolution associated with the CRAC 2022 workshop. Shared task participants were supposed to develop trainable systems capable of identifying mentions and clustering them according to identity coreference. The public edition of CorefUD 1.0, which contains 13 datasets for 10 languages, was used as the source of training and evaluation data. The CoNLL score used in previous coreference-oriented shared tasks was used as the main evaluation metric. There were 8 coreference prediction systems submitted by 5 participating teams; in addition, there was a competitive Transformer-based baseline system provided by the organizers at the beginning of the shared task. The winner system outperformed the baseline by 12 percentage points (in terms of the CoNLL scores averaged across all datasets for individual languages).

Via

Access Paper or Ask Questions

Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

Oct 12, 2021

Yilun Zhu, Sameer Pradhan, Amir Zeldes

Figure 1 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

Figure 2 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

Figure 3 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

Figure 4 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

Abstract:SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark. However lack of comparable data following the same scheme for more genres makes it difficult to evaluate generalizability to open domain data. Zhu et al. (2021) introduced the creation of the OntoGUM corpus for evaluating geralizability of the latest neural LM-based end-to-end systems. This paper covers details of the mapping process which is a set of deterministic rules applied to the rich syntactic and discourse annotations manually annotated in the GUM corpus. Out-of-domain evaluation across 12 genres shows nearly 15-20% degradation for both deterministic and deep learning systems, indicating a lack of generalizability or covert overfitting in existing coreference resolution models.

* CRAC 2021. arXiv admin note: substantial text overlap with arXiv:2106.00933

Via

Access Paper or Ask Questions