Alert button

"speech": models, code, and papers
Alert button

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

Jun 01, 2023
Anastasiia Iashchenko, Pavel Andreev, Ivan Shchekotov, Nicholas Babaev, Dmitry Vetrov

Figure 1 for UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
Figure 2 for UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
Figure 3 for UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
Figure 4 for UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
Viaarxiv icon

ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding

May 22, 2023
Mireille Fares, Catherine Pelachaud, Nicolas Obin

Figure 1 for ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Figure 2 for ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Figure 3 for ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Figure 4 for ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Viaarxiv icon

Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus

Jun 12, 2023
Théo Deschamps-Berger, Lori Lamel, Laurence Devillers

Figure 1 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Figure 2 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Figure 3 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Figure 4 for Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus
Viaarxiv icon

Speech Enhancement for Virtual Meetings on Cellular Networks

Feb 02, 2023
Hojeong Lee, Minseon Gwak, Kawon Lee, Minjeong Kim, Joseph Konan, Ojas Bhargave

Figure 1 for Speech Enhancement for Virtual Meetings on Cellular Networks
Figure 2 for Speech Enhancement for Virtual Meetings on Cellular Networks
Figure 3 for Speech Enhancement for Virtual Meetings on Cellular Networks
Figure 4 for Speech Enhancement for Virtual Meetings on Cellular Networks
Viaarxiv icon

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

Feb 27, 2023
Jiyoung Lee, Joon Son Chung, Soo-Whan Chung

Figure 1 for Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Figure 2 for Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Figure 3 for Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Figure 4 for Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Viaarxiv icon

Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets

Jul 11, 2023
Denise Moussa, Germans Hirsch, Sebastian Wankerl, Christian Riess

Figure 1 for Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Figure 2 for Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Figure 3 for Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Viaarxiv icon

SpeechLMScore: Evaluating speech generation using speech language model

Dec 08, 2022
Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe

Figure 1 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 2 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 3 for SpeechLMScore: Evaluating speech generation using speech language model
Figure 4 for SpeechLMScore: Evaluating speech generation using speech language model
Viaarxiv icon

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

Apr 23, 2023
Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian

Figure 1 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 2 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 3 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 4 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Viaarxiv icon

Investigating model performance in language identification: beyond simple error statistics

May 30, 2023
Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels

Figure 1 for Investigating model performance in language identification: beyond simple error statistics
Figure 2 for Investigating model performance in language identification: beyond simple error statistics
Figure 3 for Investigating model performance in language identification: beyond simple error statistics
Figure 4 for Investigating model performance in language identification: beyond simple error statistics
Viaarxiv icon

On the Audio-visual Synchronization for Lip-to-Speech Synthesis

Mar 01, 2023
Zhe Niu, Brian Mak

Figure 1 for On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Figure 2 for On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Figure 3 for On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Figure 4 for On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Viaarxiv icon