Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

Jun 18, 2026

Masato Takagi, Masaya Kawamura, Reo Shimizu, Yuma Shirahata

Share this with someone who'll enjoy it:

Abstract:Mean opinion score (MOS) prediction models are widely used as proxy metrics in text-to-speech (TTS) research, yet their ability to capture quality differences beyond acoustic fidelity remains unclear. We investigate this via controlled perturbations on speech: acoustic degradation, prosodic errors, and manipulation of speaker-specific characteristics such as pitch and speaking rate. We obtained MOS predictions for these speech samples from both human listeners and the model, and analyzed the differences in their perceptual characteristics. Results show that most models track acoustic degradation well, while all are insensitive to prosodic errors despite large subjective score drops. For speaker characteristics, models exhibit a double dissociation: strong mean fundamental frequency (F0) biases absent in human ratings, yet insensitivity to speaking rate and F0 variability that humans notice. These findings highlight limitations of scalar MOS prediction beyond acoustic fidelity.

* Accepted to INTERSPEECH 2026

View paper on

Share this with someone who'll enjoy it:

Title:Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

Paper and Code