Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martijn Bentum

Tracking the emergence of linguistic structure in self-supervised models learning from speech

Apr 02, 2026

Marianne de Heer Kloots, Martijn Bentum, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem Zuidema

Abstract:Self-supervised speech models learn effective representations of spoken language, which have been shown to reflect various aspects of linguistic structure. But when does such structure emerge in model training? We study the encoding of a wide range of linguistic structures, across layers and intermediate checkpoints of six Wav2Vec2 and HuBERT models trained on spoken Dutch. We find that different levels of linguistic structure show notably distinct layerwise patterns as well as learning trajectories, which can partially be explained by differences in their degree of abstraction from the acoustic signal and the timescale at which information from the input is integrated. Moreover, we find that the level at which pre-training objectives are defined strongly affects both the layerwise organization and the learning trajectories of linguistic structures, with greater parallelism induced by higher-order prediction tasks (i.e. iteratively refined pseudo-labels).

Via

Access Paper or Ask Questions

Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

Feb 21, 2023

Henk van den Heuvel, Martijn Bentum, Simone Wills, Judith C. Koops

Figure 1 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

Figure 2 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

Figure 3 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

Figure 4 for Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

Abstract:In this paper, we explore the application of language and speech technology to open-ended questions in a Dutch panel survey. In an experimental wave respondents could choose to answer open questions via speech or keyboard. Automatic speech recognition (ASR) was used to process spoken responses. We evaluated answers from these input modalities to investigate differences between spoken and typed answers.We report the errors the ASR system produces and investigate the impact of these errors on downstream analyses. Open-ended questions give more freedom to answer for respondents, but entail a non-trivial amount of work to analyse. We evaluated the feasibility of using transformer-based models (e.g. BERT) to apply sentiment analysis and topic modelling on the answers of open questions. A big advantage of transformer-based models is that they are trained on a large amount of language materials and do not necessarily need training on the target materials. This is especially advantageous for survey data, which does not contain a lot of text materials. We tested the quality of automatic sentiment analysis by comparing automatic labeling with three human raters and tested the robustness of topic modelling by comparing the generated models based on automatic and manually transcribed spoken answers.

* 7 pages

Via

Access Paper or Ask Questions