Alert button

"speech": models, code, and papers
Alert button

What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions

Apr 10, 2024
Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah

Viaarxiv icon

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

Mar 21, 2024
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

Figure 1 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 2 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 3 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 4 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Viaarxiv icon

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Add code
Bookmark button
Alert button
Apr 16, 2024
Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo

Viaarxiv icon

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

Mar 27, 2024
Xilin Jiang, Cong Han, Nima Mesgarani

Viaarxiv icon

Driving Animatronic Robot Facial Expression From Speech

Add code
Bookmark button
Alert button
Mar 21, 2024
Boren Li, Hang Li, Hangxin Liu

Figure 1 for Driving Animatronic Robot Facial Expression From Speech
Figure 2 for Driving Animatronic Robot Facial Expression From Speech
Figure 3 for Driving Animatronic Robot Facial Expression From Speech
Figure 4 for Driving Animatronic Robot Facial Expression From Speech
Viaarxiv icon

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech

Apr 03, 2024
Jaehyeon Kim, Keon Lee, Seungjun Chung, Jaewoong Cho

Viaarxiv icon

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

Add code
Bookmark button
Alert button
Mar 28, 2024
Janis Goldzycher, Paul Röttger, Gerold Schneider

Figure 1 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 2 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 3 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 4 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Viaarxiv icon

Resilience of Large Language Models for Noisy Instructions

Apr 15, 2024
Bin Wang, Chengwei Wei, Zhengyuan Liu, Geyu Lin, Nancy F. Chen

Viaarxiv icon

Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression

Apr 08, 2024
Ernst Seidel, Pejman Mowlaee, Tim Fingscheidt

Viaarxiv icon

Crowdsourced Multilingual Speech Intelligibility Testing

Mar 21, 2024
Laura Lechler, Kamil Wojcicki

Viaarxiv icon