Alert button
Picture for Yilun Zhu

Yilun Zhu

Alert button

Incorporating Singletons and Mention-based Features in Coreference Resolution via Multi-task Learning for Better Generalization

Sep 20, 2023
Yilun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

Previous attempts to incorporate a mention detection step into end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention span data as well as other entity information. This paper presents a coreference model that learns singletons as well as features such as entity type and information status via a multi-task learning-based approach. This approach achieves new state-of-the-art scores on the OntoGUM benchmark (+2.7 points) and increases robustness on multiple out-of-domain datasets (+2.3 points on average), likely due to greater generalizability for mention detection and utilization of more data from singletons when compared to only coreferent mention pair matching.

* IJCNLP-AACL 2023 
Viaarxiv icon

Fusing Sparsity with Deep Learning for Rotating Scatter Mask Gamma Imaging

Jul 29, 2023
Yilun Zhu, Clayton Scott, Darren Holland, George Landon, Aaron Fjeldsted, Azaree Lintereur

Figure 1 for Fusing Sparsity with Deep Learning for Rotating Scatter Mask Gamma Imaging
Figure 2 for Fusing Sparsity with Deep Learning for Rotating Scatter Mask Gamma Imaging
Figure 3 for Fusing Sparsity with Deep Learning for Rotating Scatter Mask Gamma Imaging

Many nuclear safety applications need fast, portable, and accurate imagers to better locate radiation sources. The Rotating Scatter Mask (RSM) system is an emerging device with the potential to meet these needs. The main challenge is the under-determined nature of the data acquisition process: the dimension of the measured signal is far less than the dimension of the image to be reconstructed. To address this challenge, this work aims to fuse model-based sparsity-promoting regularization and a data-driven deep neural network denoising image prior to perform image reconstruction. An efficient algorithm is developed and produces superior reconstructions relative to current approaches.

Viaarxiv icon

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

Jun 03, 2023
Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, Yilun Zhu, Amir Zeldes

Figure 1 for GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation
Figure 2 for GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation
Figure 3 for GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation
Figure 4 for GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.

* Camera-ready for LAW-XVII collocated with ACL 2023 
Viaarxiv icon

Mixture Proportion Estimation Beyond Irreducibility

Jun 02, 2023
Yilun Zhu, Aaron Fjeldsted, Darren Holland, George Landon, Azaree Lintereur, Clayton Scott

Figure 1 for Mixture Proportion Estimation Beyond Irreducibility
Figure 2 for Mixture Proportion Estimation Beyond Irreducibility
Figure 3 for Mixture Proportion Estimation Beyond Irreducibility
Figure 4 for Mixture Proportion Estimation Beyond Irreducibility

The task of mixture proportion estimation (MPE) is to estimate the weight of a component distribution in a mixture, given observations from both the component and mixture. Previous work on MPE adopts the irreducibility assumption, which ensures identifiablity of the mixture proportion. In this paper, we propose a more general sufficient condition that accommodates several settings of interest where irreducibility does not hold. We further present a resampling-based meta-algorithm that takes any existing MPE algorithm designed to work under irreducibility and adapts it to work under our more general condition. Our approach empirically exhibits improved estimation performance relative to baseline methods and to a recently proposed regrouping-based algorithm.

Viaarxiv icon

Findings of the Shared Task on Multilingual Coreference Resolution

Sep 16, 2022
Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, Yilun Zhu

Figure 1 for Findings of the Shared Task on Multilingual Coreference Resolution
Figure 2 for Findings of the Shared Task on Multilingual Coreference Resolution
Figure 3 for Findings of the Shared Task on Multilingual Coreference Resolution
Figure 4 for Findings of the Shared Task on Multilingual Coreference Resolution

This paper presents an overview of the shared task on multilingual coreference resolution associated with the CRAC 2022 workshop. Shared task participants were supposed to develop trainable systems capable of identifying mentions and clustering them according to identity coreference. The public edition of CorefUD 1.0, which contains 13 datasets for 10 languages, was used as the source of training and evaluation data. The CoNLL score used in previous coreference-oriented shared tasks was used as the main evaluation metric. There were 8 coreference prediction systems submitted by 5 participating teams; in addition, there was a competitive Transformer-based baseline system provided by the organizers at the beginning of the shared task. The winner system outperformed the baseline by 12 percentage points (in terms of the CoNLL scores averaged across all datasets for individual languages).

Viaarxiv icon

Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

Oct 12, 2021
Yilun Zhu, Sameer Pradhan, Amir Zeldes

Figure 1 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms
Figure 2 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms
Figure 3 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms
Figure 4 for Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark. However lack of comparable data following the same scheme for more genres makes it difficult to evaluate generalizability to open domain data. Zhu et al. (2021) introduced the creation of the OntoGUM corpus for evaluating geralizability of the latest neural LM-based end-to-end systems. This paper covers details of the mapping process which is a set of deterministic rules applied to the rich syntactic and discourse annotations manually annotated in the GUM corpus. Out-of-domain evaluation across 12 genres shows nearly 15-20% degradation for both deterministic and deep learning systems, indicating a lack of generalizability or covert overfitting in existing coreference resolution models.

* CRAC 2021. arXiv admin note: substantial text overlap with arXiv:2106.00933 
Viaarxiv icon

DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection

Sep 20, 2021
Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, Yilun Zhu, Amir Zeldes

Figure 1 for DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection
Figure 2 for DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection
Figure 3 for DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection
Figure 4 for DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection

This paper describes our submission to the DISRPT2021 Shared Task on Discourse Unit Segmentation, Connective Detection, and Relation Classification. Our system, called DisCoDisCo, is a Transformer-based neural classifier which enhances contextualized word embeddings (CWEs) with hand-crafted features, relying on tokenwise sequence tagging for discourse segmentation and connective detection, and a feature-rich, encoder-less sentence pair classifier for relation classification. Our results for the first two tasks outperform SOTA scores from the previous 2019 shared task, and results on relation classification suggest strong performance on the new 2021 benchmark. Ablation tests show that including features beyond CWEs are helpful for both tasks, and a partial evaluation of multiple pre-trained Transformer-based language models indicates that models pre-trained on the Next Sentence Prediction (NSP) task are optimal for relation classification.

* System submission for the CODI-DISRPT 2021 Shared Task on Discourse Processing across Formalisms. 1st place in all subtasks 
Viaarxiv icon

OntoGUM: Evaluating Contextualized SOTA Coreference Resolution on 12 More Genres

Jun 03, 2021
Yilun Zhu, Sameer Pradhan, Amir Zeldes

Figure 1 for OntoGUM: Evaluating Contextualized SOTA Coreference Resolution on 12 More Genres
Figure 2 for OntoGUM: Evaluating Contextualized SOTA Coreference Resolution on 12 More Genres
Figure 3 for OntoGUM: Evaluating Contextualized SOTA Coreference Resolution on 12 More Genres

SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark. However lack of comparable data following the same scheme for more genres makes it difficult to evaluate generalizability to open domain data. This paper provides a dataset and comprehensive evaluation showing that the latest neural LM based end-to-end systems degrade very substantially out of domain. We make an OntoNotes-like coreference dataset called OntoGUM publicly available, converted from GUM, an English corpus covering 12 genres, using deterministic rules, which we evaluate. Thanks to the rich syntactic and discourse annotations in GUM, we are able to create the largest human-annotated coreference corpus following the OntoNotes guidelines, and the first to be evaluated for consistency with the OntoNotes scheme. Out-of-domain evaluation across 12 genres shows nearly 15-20% degradation for both deterministic and deep learning systems, indicating a lack of generalizability or covert overfitting in existing coreference resolution models.

* ACL 2021 
Viaarxiv icon

AMALGUM -- A Free, Balanced, Multilayer English Web Corpus

Jun 18, 2020
Luke Gessler, Siyao Peng, Yang Liu, Yilun Zhu, Shabnam Behzad, Amir Zeldes

Figure 1 for AMALGUM -- A Free, Balanced, Multilayer English Web Corpus
Figure 2 for AMALGUM -- A Free, Balanced, Multilayer English Web Corpus
Figure 3 for AMALGUM -- A Free, Balanced, Multilayer English Web Corpus
Figure 4 for AMALGUM -- A Free, Balanced, Multilayer English Web Corpus

We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory. By tapping open online data sources the corpus is meant to offer a more sizable alternative to smaller manually created annotated data sets, while avoiding pitfalls such as imbalanced or unknown composition, licensing problems, and low-quality natural language processing. We harness knowledge from multiple annotation layers in order to achieve a "better than NLP" benchmark and evaluate the accuracy of the resulting resource.

* In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 5267-5275), 2020  
* Accepted at LREC 2020. See https://www.aclweb.org/anthology/2020.lrec-1.648/ (note: ACL Anthology's title is currently out of date) 
Viaarxiv icon

A Corpus of Adpositional Supersenses for Mandarin Chinese

Mar 18, 2020
Siyao Peng, Yang Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao, Nathan Schneider

Figure 1 for A Corpus of Adpositional Supersenses for Mandarin Chinese
Figure 2 for A Corpus of Adpositional Supersenses for Mandarin Chinese
Figure 3 for A Corpus of Adpositional Supersenses for Mandarin Chinese
Figure 4 for A Corpus of Adpositional Supersenses for Mandarin Chinese

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext.

* LREC 2020 camera-ready 
Viaarxiv icon