Alert button

"speech": models, code, and papers
Alert button

An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis

Mar 19, 2024
Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

Figure 1 for An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Figure 2 for An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Figure 3 for An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Figure 4 for An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Viaarxiv icon

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech

Apr 03, 2024
Jaehyeon Kim, Keon Lee, Seungjun Chung, Jaewoong Cho

Viaarxiv icon

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

Mar 21, 2024
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

Figure 1 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 2 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 3 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Figure 4 for XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Viaarxiv icon

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

Mar 27, 2024
Xilin Jiang, Cong Han, Nima Mesgarani

Viaarxiv icon

ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana

Apr 12, 2024
Monica Romero, Sandra Gomez, Iván G. Torre

Viaarxiv icon

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

Add code
Bookmark button
Alert button
Mar 28, 2024
Janis Goldzycher, Paul Röttger, Gerold Schneider

Figure 1 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 2 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 3 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 4 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Viaarxiv icon

Driving Animatronic Robot Facial Expression From Speech

Add code
Bookmark button
Alert button
Mar 21, 2024
Boren Li, Hang Li, Hangxin Liu

Figure 1 for Driving Animatronic Robot Facial Expression From Speech
Figure 2 for Driving Animatronic Robot Facial Expression From Speech
Figure 3 for Driving Animatronic Robot Facial Expression From Speech
Figure 4 for Driving Animatronic Robot Facial Expression From Speech
Viaarxiv icon

Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation

Apr 04, 2024
Luca Comanducci, Fabio Antonacci, Augusto Sarti

Viaarxiv icon

Crowdsourced Multilingual Speech Intelligibility Testing

Mar 21, 2024
Laura Lechler, Kamil Wojcicki

Viaarxiv icon

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

Apr 12, 2024
Hassan Ali, Philipp Allgeuer, Stefan Wermter

Viaarxiv icon