Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bogdan Vlasenko

Toward using Speech to Sense Student Emotion in Remote Learning Environments

Apr 10, 2026

Sargam Vyas, Bogdan Vlasenko, André Mayoraz, Egon Werlen, Per Bergamin, Mathew Magimai. -Doss

Abstract:With advancements in multimodal communication technologies, remote learning environments such as, distance universities are increasing. Remote learning typically happens asynchronously. As a consequence, unlike face-to-face in-person classroom teaching, this lacks availability of sufficient emotional cues for making learning a pleasant experience. Motivated by advances made in the paralinguistic speech processing community on emotion prediction, in this paper we explore use of speech for sensing students' emotions by building upon speech-based self-control tasks developed to aid effective remote learning. More precisely, we investigate: (a) whether speech acquired through self-control tasks exhibit perceptible variation along valence, arousal, and dominance dimensions? and (b) whether those dimensional emotion variations can be automatically predicted? We address these two research questions by developing a dataset containing spontaneous monologue speech acquired as open responses to self-control tasks and by carrying out subjective listener evaluations and automatic dimensional emotion prediction studies on that dataset. Our investigations indicate that speech-based self-control tasks can be a means to sense student emotion in remote learning environment. This opens potential venues to seamlessly integrate paralinguistic speech processing technologies in the remote learning loop for enhancing learning experiences through instructional design and feedback generation.

Via

Access Paper or Ask Questions

Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Jun 23, 2022

Tilak Purohit, Imen Ben Mahmoud, Bogdan Vlasenko, Mathew Magimai. -Doss

Figure 1 for Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Figure 2 for Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Figure 3 for Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Figure 4 for Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

Abstract:The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses on understanding the emotional facets of the non-linguistic vocalizations (vocal bursts (VB)). The objective of this challenge is to predict emotional intensities for VB, being a multi-task challenge it also requires to predict speakers' age and native-country. For this challenge we study and compare two distinct embedding spaces namely, self-supervised learning (SSL) based embeddings and task-specific supervised learning based embeddings. Towards that, we investigate feature representations obtained from several pre-trained SSL neural networks and task-specific supervised classification neural networks. Our studies show that the best performance is obtained with a hybrid approach, where predictions derived via both SSL and task-specific supervised learning are used. Our best system on test-set surpasses the ComPARE baseline (harmonic mean of all sub-task scores i.e., $S_{MTL}$) by a relative $13\%$ margin.

Via

Access Paper or Ask Questions