Prosody Prediction


Prosody prediction is the process of predicting the intonation, rhythm, and stress patterns of speech.

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

Add code
Mar 30, 2026
Viaarxiv icon

Dual-Model Prediction of Affective Engagement and Vocal Attractiveness from Speaker Expressiveness in Video Learning

Add code
Mar 19, 2026
Viaarxiv icon

EmoTaG: Emotion-Aware Talking Head Synthesis on Gaussian Splatting with Few-Shot Personalization

Add code
Mar 22, 2026
Viaarxiv icon

Foundation Model Embeddings Meet Blended Emotions: A Multimodal Fusion Approach for the BLEMORE Challenge

Add code
Mar 24, 2026
Viaarxiv icon

Causal Prosody Mediation for Text-to-Speech:Counterfactual Training of Duration, Pitch, and Energy in FastSpeech2

Add code
Mar 12, 2026
Viaarxiv icon

Collecting Prosody in the Wild: A Content-Controlled, Privacy-First Smartphone Protocol and Empirical Evaluation

Add code
Mar 17, 2026
Viaarxiv icon

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

Add code
Mar 17, 2026
Viaarxiv icon

End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation

Add code
Mar 02, 2026
Viaarxiv icon

CC-G2PnP: Streaming Grapheme-to-Phoneme and prosody with Conformer-CTC for unsegmented languages

Add code
Feb 19, 2026
Viaarxiv icon

The Role of Prosodic and Lexical Cues in Turn-Taking with Self-Supervised Speech Representations

Add code
Jan 20, 2026
Viaarxiv icon