Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

May 09, 2026

Ahan Chatterjee, Matthias Schöffel, Matthias Aßenmacher, Esteban Garces Arias

Share this with someone who'll enjoy it:

Abstract:The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine). In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available.

* Accepted at NLP4DH @ ACL 2026

View paper on

Share this with someone who'll enjoy it:

Title:Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Paper and Code