Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roksana Goworek

Rethinking Metrics for Lexical Semantic Change Detection

Feb 17, 2026

Roksana Goworek, Haim Dubossarsky

Abstract:Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a small set of semantic change metrics, primarily Average Pairwise Distance (APD) and cosine distance over word prototypes (PRT). We introduce Average Minimum Distance (AMD) and Symmetric Average Minimum Distance (SAMD), new measures that quantify semantic change via local correspondence between word usages across time periods. Across multiple languages, encoder models, and representation spaces, we show that AMD often provides more robust performance, particularly under dimensionality reduction and with non-specialised encoders, while SAMD excels with specialised encoders. We suggest that LSCD may benefit from considering alternative semantic change metrics beyond APD and PRT, with AMD offering a robust option for contextualised embedding-based analysis.

* Accepted to the LChange 2026 Workshop, colocated with EACL 2026

Via

Access Paper or Ask Questions

Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks

May 30, 2025

Roksana Goworek, Haim Dubossarsky

Figure 1 for Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks

Figure 2 for Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks

Figure 3 for Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks

Figure 4 for Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks

Abstract:Cross-lingual transfer allows models to perform tasks in languages unseen during training and is often assumed to benefit from increased multilinguality. In this work, we challenge this assumption in the context of two underexplored, sense-aware tasks: polysemy disambiguation and lexical semantic change. Through a large-scale analysis across 28 languages, we show that multilingual training is neither necessary nor inherently beneficial for effective transfer. Instead, we find that confounding factors - such as fine-tuning data composition and evaluation artifacts - better account for the perceived advantages of multilinguality. Our findings call for more rigorous evaluations in multilingual NLP. We release fine-tuned models and benchmarks to support further research, with implications extending to low-resource and typologically diverse languages.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

May 29, 2025

Roksana Goworek, Harpal Karlcut, Muhammad Shezad, Nijaguna Darshana, Abhishek Mane, Syam Bondada, Raghav Sikka, Ulvi Mammadov, Rauf Allahverdiyev, Sriram Purighella(+3 more)

Abstract:This paper addresses the critical need for high-quality evaluation datasets in low-resource languages to advance cross-lingual transfer. While cross-lingual transfer offers a key strategy for leveraging multilingual pretraining to expand language technologies to understudied and typologically diverse languages, its effectiveness is dependent on quality and suitable benchmarks. We release new sense-annotated datasets of sentences containing polysemous words, spanning nine low-resource languages across diverse language families and scripts. To facilitate dataset creation, the paper presents a demonstrably beneficial semi-automatic annotation method. The utility of the datasets is demonstrated through Word-in-Context (WiC) formatted experiments that evaluate transfer on these low-resource languages. Results highlight the importance of targeted dataset creation and evaluation for effective polysemy disambiguation in low-resource settings and transfer studies. The released datasets and code aim to support further research into fair, robust, and truly multilingual NLP.

* 8 pages, 22 figures, submitted to SIGTYP 2025 workshop in ACL

Via

Access Paper or Ask Questions