Alert button

"speech": models, code, and papers
Alert button

Audio-Visual Neural Syntax Acquisition

Add code
Bookmark button
Alert button
Oct 11, 2023
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

Figure 1 for Audio-Visual Neural Syntax Acquisition
Figure 2 for Audio-Visual Neural Syntax Acquisition
Figure 3 for Audio-Visual Neural Syntax Acquisition
Figure 4 for Audio-Visual Neural Syntax Acquisition
Viaarxiv icon

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Sep 04, 2023
Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng

Figure 1 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 2 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 3 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 4 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Viaarxiv icon

LaughTalk: Expressive 3D Talking Head Generation with Laughter

Add code
Bookmark button
Alert button
Nov 02, 2023
Kim Sung-Bin, Lee Hyun, Da Hye Hong, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

Figure 1 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Figure 2 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Figure 3 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Figure 4 for LaughTalk: Expressive 3D Talking Head Generation with Laughter
Viaarxiv icon

Multilingual Speech-to-Speech Translation into Multiple Target Languages

Jul 17, 2023
Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino

Viaarxiv icon

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

Add code
Bookmark button
Alert button
Sep 10, 2023
Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu

Figure 1 for VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Figure 2 for VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Figure 3 for VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Figure 4 for VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
Viaarxiv icon

Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?

Add code
Bookmark button
Alert button
Sep 12, 2023
Xin Wang, Junichi Yamagishi

Figure 1 for Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Figure 2 for Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Figure 3 for Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Figure 4 for Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Viaarxiv icon

Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio

Nov 01, 2023
Xudong Xu, Dejan Markovic, Jacob Sandakly, Todd Keebler, Steven Krenn, Alexander Richard

Figure 1 for Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Figure 2 for Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Figure 3 for Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Figure 4 for Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio
Viaarxiv icon

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Add code
Bookmark button
Alert button
Aug 31, 2023
Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu

Figure 1 for LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Figure 2 for LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Figure 3 for LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Viaarxiv icon

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Add code
Bookmark button
Alert button
Aug 29, 2023
Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin

Figure 1 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model
Figure 2 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model
Figure 3 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model
Figure 4 for C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model
Viaarxiv icon

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model

Aug 14, 2023
Fan Zhang, Naye Ji, Fuxing Gao, Siyuan Zhao, Zhaohan Wang, Shunman Li

Figure 1 for Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Figure 2 for Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Figure 3 for Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Figure 4 for Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Viaarxiv icon