Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vladimir Dobrovolskii

RuCoCo: a new Russian corpus with coreference annotation

Jun 10, 2022

Vladimir Dobrovolskii, Mariia Michurina, Alexandra Ivoylova

Figure 1 for RuCoCo: a new Russian corpus with coreference annotation

Figure 2 for RuCoCo: a new Russian corpus with coreference annotation

Figure 3 for RuCoCo: a new Russian corpus with coreference annotation

Figure 4 for RuCoCo: a new Russian corpus with coreference annotation

Abstract:We present a new corpus with coreference annotation, Russian Coreference Corpus (RuCoCo). The goal of RuCoCo is to obtain a large number of annotated texts while maintaining high inter-annotator agreement. RuCoCo contains news texts in Russian, part of which were annotated from scratch, and for the rest the machine-generated annotations were refined by human annotators. The size of our corpus is one million words and around 150,000 mentions. We make the corpus publicly available.

Via

Access Paper or Ask Questions

Word-Level Coreference Resolution

Sep 09, 2021

Vladimir Dobrovolskii

Figure 1 for Word-Level Coreference Resolution

Figure 2 for Word-Level Coreference Resolution

Figure 3 for Word-Level Coreference Resolution

Figure 4 for Word-Level Coreference Resolution

Abstract:Recent coreference resolution models rely heavily on span representations to find coreference links between word spans. As the number of spans is $O(n^2)$ in the length of text and the number of potential links is $O(n^4)$, various pruning techniques are necessary to make this approach computationally feasible. We propose instead to consider coreference links between individual words rather than word spans and then reconstruct the word spans. This reduces the complexity of the coreference model to $O(n^2)$ and allows it to consider all potential mentions without pruning any of them out. We also demonstrate that, with these changes, SpanBERT for coreference resolution will be significantly outperformed by RoBERTa. While being highly efficient, our model performs competitively with recent coreference resolution systems on the OntoNotes benchmark.

* Accepted to EMNLP-2021

Via

Access Paper or Ask Questions