Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Morphology Matters: A Multilingual Language Modeling Analysis

Dec 11, 2020

Hyunji Hayley Park, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu, Lane Schwartz

Figure 1 for Morphology Matters: A Multilingual Language Modeling Analysis

Figure 2 for Morphology Matters: A Multilingual Language Modeling Analysis

Figure 3 for Morphology Matters: A Multilingual Language Modeling Analysis

Figure 4 for Morphology Matters: A Multilingual Language Modeling Analysis

Share this with someone who'll enjoy it:

Abstract:Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features. We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically-motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language's morphology on language modeling.

* To appear in TACL, a pre-MIT Press publication version; 15 pages, 3 figures; for the datasets, see https://github.com/hayleypark/MorphologyMatters

View paper on

Share this with someone who'll enjoy it:

Title:Morphology Matters: A Multilingual Language Modeling Analysis

Paper and Code