Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Angelo Ortiz Tandazo

MauBERT: Universal Phonetic Inductive Biases for Few-Shot Acoustic Units Discovery

Dec 22, 2025

Angelo Ortiz Tandazo, Manel Khentout, Youssef Benchekroun, Thomas Hueber, Emmanuel Dupoux

Abstract:This paper introduces MauBERT, a multilingual extension of HuBERT that leverages articulatory features for robust cross-lingual phonetic representation learning. We continue HuBERT pre-training with supervision based on a phonetic-to-articulatory feature mapping in 55 languages. Our models learn from multilingual data to predict articulatory features or phones, resulting in language-independent representations that capture multilingual phonetic properties. Through comprehensive ABX discriminability testing, we show MauBERT models produce more context-invariant representations than state-of-the-art multilingual self-supervised learning models. Additionally, the models effectively adapt to unseen languages and casual speech with minimal self-supervised fine-tuning (10 hours of speech). This establishes an effective approach for instilling linguistic inductive biases in self-supervised speech models.

Via

Access Paper or Ask Questions

Simulating Articulatory Trajectories with Phonological Feature Interpolation

Aug 08, 2024

Angelo Ortiz Tandazo, Thomas Schatz, Thomas Hueber, Emmanuel Dupoux

Figure 1 for Simulating Articulatory Trajectories with Phonological Feature Interpolation

Figure 2 for Simulating Articulatory Trajectories with Phonological Feature Interpolation

Figure 3 for Simulating Articulatory Trajectories with Phonological Feature Interpolation

Figure 4 for Simulating Articulatory Trajectories with Phonological Feature Interpolation

Abstract:As a first step towards a complete computational model of speech learning involving perception-production loops, we investigate the forward mapping between pseudo-motor commands and articulatory trajectories. Two phonological feature sets, based respectively on generative and articulatory phonology, are used to encode a phonetic target sequence. Different interpolation techniques are compared to generate smooth trajectories in these feature spaces, with a potential optimisation of the target value and timing to capture co-articulation effects. We report the Pearson correlation between a linear projection of the generated trajectories and articulatory data derived from a multi-speaker dataset of electromagnetic articulography (EMA) recordings. A correlation of 0.67 is obtained with an extended feature set based on generative phonology and a linear interpolation technique. We discuss the implications of our results for our understanding of the dynamics of biological motion.

* accepted at Interspeech 2024

Via

Access Paper or Ask Questions