Alert button

"speech": models, code, and papers
Alert button

Parallel Synthesis for Autoregressive Speech Generation

Apr 25, 2022
Po-chun Hsu, Da-rong Liu, Andy T. Liu, Hung-yi Lee

Figure 1 for Parallel Synthesis for Autoregressive Speech Generation
Figure 2 for Parallel Synthesis for Autoregressive Speech Generation
Figure 3 for Parallel Synthesis for Autoregressive Speech Generation
Figure 4 for Parallel Synthesis for Autoregressive Speech Generation
Viaarxiv icon

On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

Sep 26, 2022
Georgios Sidiropoulos, Svitlana Vakulenko, Evangelos Kanoulas

Figure 1 for On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering
Figure 2 for On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering
Figure 3 for On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering
Figure 4 for On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering
Viaarxiv icon

Contrastive Representation Learning for Acoustic Parameter Estimation

Feb 22, 2023
Philipp Götz, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets

Figure 1 for Contrastive Representation Learning for Acoustic Parameter Estimation
Figure 2 for Contrastive Representation Learning for Acoustic Parameter Estimation
Figure 3 for Contrastive Representation Learning for Acoustic Parameter Estimation
Figure 4 for Contrastive Representation Learning for Acoustic Parameter Estimation
Viaarxiv icon

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Nov 07, 2022
Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong

Figure 1 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 2 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 3 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 4 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Viaarxiv icon

Inference skipping for more efficient real-time speech enhancement with parallel RNNs

Jul 22, 2022
Xiaohuai Le, Tong Lei, Kai Chen, Jing Lu

Figure 1 for Inference skipping for more efficient real-time speech enhancement with parallel RNNs
Figure 2 for Inference skipping for more efficient real-time speech enhancement with parallel RNNs
Figure 3 for Inference skipping for more efficient real-time speech enhancement with parallel RNNs
Figure 4 for Inference skipping for more efficient real-time speech enhancement with parallel RNNs
Viaarxiv icon

Challenges and Opportunities in Multi-device Speech Processing

Jun 27, 2022
Gregory Ciccarelli, Jarred Barber, Arun Nair, Israel Cohen, Tao Zhang

Figure 1 for Challenges and Opportunities in Multi-device Speech Processing
Viaarxiv icon

A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

Apr 04, 2022
Rishabh Jain, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu

Figure 1 for A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Figure 2 for A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Figure 3 for A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Figure 4 for A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Viaarxiv icon

Universal Speech Enhancement with Score-based Diffusion

Jun 07, 2022
Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini

Figure 1 for Universal Speech Enhancement with Score-based Diffusion
Figure 2 for Universal Speech Enhancement with Score-based Diffusion
Figure 3 for Universal Speech Enhancement with Score-based Diffusion
Figure 4 for Universal Speech Enhancement with Score-based Diffusion
Viaarxiv icon

DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions

Mar 14, 2023
Geumbyeol Hwang, Sunwon Hong, Seunghyun Lee, Sungwoo Park, Gyeongsu Chae

Figure 1 for DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions
Figure 2 for DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions
Figure 3 for DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions
Figure 4 for DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions
Viaarxiv icon

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

Mar 14, 2023
Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng

Figure 1 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Figure 2 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Figure 3 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Figure 4 for Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Viaarxiv icon