Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation

Jul 28, 2025

Hyung Kyu Kim, Hak Gu Kim

Share this with someone who'll enjoy it:

Abstract:Speech-driven 3D facial animation aims to generate realistic facial movements synchronized with audio. Traditional methods primarily minimize reconstruction loss by aligning each frame with ground-truth. However, this frame-wise approach often fails to capture the continuity of facial motion, leading to jittery and unnatural outputs due to coarticulation. To address this, we propose a novel phonetic context-aware loss, which explicitly models the influence of phonetic context on viseme transitions. By incorporating a viseme coarticulation weight, we assign adaptive importance to facial movements based on their dynamic changes over time, ensuring smoother and perceptually consistent animations. Extensive experiments demonstrate that replacing the conventional reconstruction loss with ours improves both quantitative metrics and visual quality. It highlights the importance of explicitly modeling phonetic context-dependent visemes in synthesizing natural speech-driven 3D facial animation. Project page: https://cau-irislab.github.io/interspeech25/

* Accepted for Interspeech 2025 Project Page: https://cau-irislab.github.io/interspeech25/

View paper on

Share this with someone who'll enjoy it:

Title:Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation

Paper and Code