Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

May 29, 2025

Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu

Figure 1 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Figure 2 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Figure 3 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Figure 4 for Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Share this with someone who'll enjoy it:

Abstract:This paper presents a novel end-to-end LLM-empowered explainable speech emotion recognition (SER) approach. Fine-grained speech emotion descriptor (SED) features, e.g., pitch, tone and emphasis, are disentangled from HuBERT SSL representations via alternating LLM fine-tuning to joint SER-SED prediction and ASR tasks. VAE compressed HuBERT features derived via Information Bottleneck (IB) are used to adjust feature granularity. Experiments on the IEMOCAP and MELD benchmarks demonstrate that our approach consistently outperforms comparable LLaMA-based SER baselines, including those using either (a) alternating multi-task fine-tuning alone or (b) feature disentanglement only. Statistically significant increase of SER unweighted accuracy by up to 4.0% and 3.7% absolute (5.4% and 6.6% relative) are obtained. More importantly, emotion descriptors offer further explainability for SER.

* Accepted by INTERSPEECH2025

View paper on

Share this with someone who'll enjoy it:

Title:Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Paper and Code