Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kasia Hitczenko

Employing self-supervised learning models for cross-linguistic child speech maturity classification

Jun 10, 2025

Theo Zhang, Madurya Suresh, Anne S. Warlaumont, Kasia Hitczenko, Alejandrina Cristia, Margaret Cychosz

Figure 1 for Employing self-supervised learning models for cross-linguistic child speech maturity classification

Figure 2 for Employing self-supervised learning models for cross-linguistic child speech maturity classification

Figure 3 for Employing self-supervised learning models for cross-linguistic child speech maturity classification

Figure 4 for Employing self-supervised learning models for cross-linguistic child speech maturity classification

Abstract:Speech technology systems struggle with many downstream tasks for child speech due to small training corpora and the difficulties that child speech pose. We apply a novel dataset, SpeechMaturity, to state-of-the-art transformer models to address a fundamental classification task: identifying child vocalizations. Unlike previous corpora, our dataset captures maximally ecologically-valid child vocalizations across an unprecedented sample, comprising children acquiring 25+ languages in the U.S., Bolivia, Vanuatu, Papua New Guinea, Solomon Islands, and France. The dataset contains 242,004 labeled vocalizations, magnitudes larger than previous work. Models were trained to distinguish between cry, laughter, mature (consonant+vowel), and immature speech (just consonant or vowel). Models trained on the dataset outperform state-of-the-art models trained on previous datasets, achieved classification accuracy comparable to humans, and were robust across rural and urban settings.

* To be published in Interspeech 2025. 5 pages, 2 figures. For associated Github repository, see https://github.com/spoglab-stanford/w2v2-pro-sm/tree/main/speechbrain/recipes/W2V2-LL4300-Pro-SM

Via

Access Paper or Ask Questions

DDKtor: Automatic Diadochokinetic Speech Analysis

Jun 29, 2022

Yael Segal, Kasia Hitczenko, Matthew Goldrick, Adam Buchwald, Angela Roberts, Joseph Keshet

Figure 1 for DDKtor: Automatic Diadochokinetic Speech Analysis

Figure 2 for DDKtor: Automatic Diadochokinetic Speech Analysis

Figure 3 for DDKtor: Automatic Diadochokinetic Speech Analysis

Figure 4 for DDKtor: Automatic Diadochokinetic Speech Analysis

Abstract:Diadochokinetic speech tasks (DDK), in which participants repeatedly produce syllables, are commonly used as part of the assessment of speech motor impairments. These studies rely on manual analyses that are time-intensive, subjective, and provide only a coarse-grained picture of speech. This paper presents two deep neural network models that automatically segment consonants and vowels from unannotated, untranscribed speech. Both models work on the raw waveform and use convolutional layers for feature extraction. The first model is based on an LSTM classifier followed by fully connected layers, while the second model adds more convolutional layers followed by fully connected layers. These segmentations predicted by the models are used to obtain measures of speech rate and sound duration. Results on a young healthy individuals dataset show that our LSTM model outperforms the current state-of-the-art systems and performs comparably to trained human annotators. Moreover, the LSTM model also presents comparable results to trained human annotators when evaluated on unseen older individuals with Parkinson's Disease dataset.

* Accepted to Interspeech 2022

Via

Access Paper or Ask Questions