Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Oct 21, 2020

Jiaming Luo, Frederik Hartmann, Enrico Santus, Yuan Cao, Regina Barzilay

Figure 1 for Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Figure 2 for Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Figure 3 for Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Figure 4 for Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Share this with someone who'll enjoy it:

Abstract:Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship.

* TACL 2020, pre-MIT Press publication version

View paper on

Share this with someone who'll enjoy it:

Title:Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Paper and Code