Alert button

"speech": models, code, and papers
Alert button

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

May 09, 2022
Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang

Figure 1 for Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Figure 2 for Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Figure 3 for Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Figure 4 for Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Viaarxiv icon

Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments

May 17, 2022
Joe Caroselli, Arun Narayanan, Yiteng Huang

Figure 1 for Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments
Figure 2 for Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments
Figure 3 for Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments
Figure 4 for Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments
Viaarxiv icon

An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations

Dec 19, 2022
Shelly Jain, Priyanshi Pal, Anil Vuppala, Prasanta Ghosh, Chiranjeevi Yarra

Figure 1 for An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations
Figure 2 for An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations
Figure 3 for An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations
Figure 4 for An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations
Viaarxiv icon

Toward a realistic model of speech processing in the brain with self-supervised learning

Jun 03, 2022
Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King

Figure 1 for Toward a realistic model of speech processing in the brain with self-supervised learning
Figure 2 for Toward a realistic model of speech processing in the brain with self-supervised learning
Figure 3 for Toward a realistic model of speech processing in the brain with self-supervised learning
Figure 4 for Toward a realistic model of speech processing in the brain with self-supervised learning
Viaarxiv icon

Analysis of EEG frequency bands for Envisioned Speech Recognition

Mar 29, 2022
Ayush Tripathi

Figure 1 for Analysis of EEG frequency bands for Envisioned Speech Recognition
Figure 2 for Analysis of EEG frequency bands for Envisioned Speech Recognition
Figure 3 for Analysis of EEG frequency bands for Envisioned Speech Recognition
Figure 4 for Analysis of EEG frequency bands for Envisioned Speech Recognition
Viaarxiv icon

On the Impact of Noises in Crowd-Sourced Data for Speech Translation

Jun 28, 2022
Siqi Ouyang, Rong Ye, Lei Li

Figure 1 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Figure 2 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Figure 3 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Figure 4 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Viaarxiv icon

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Jan 13, 2022
Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg

Figure 1 for The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition
Figure 2 for The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition
Viaarxiv icon

Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise

Mar 29, 2022
Tuomo Raitio, Petko Petkov, Jiangchuan Li, Muhammed Shifas, Andrea Davis, Yannis Stylianou

Figure 1 for Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise
Figure 2 for Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise
Figure 3 for Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise
Figure 4 for Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise
Viaarxiv icon

Can Self-Supervised Learning solve the problem of child speech recognition?

Apr 06, 2022
Rishabh Jain, Mariam Yiwere, Dan Bigioi, Peter Corcoran

Figure 1 for Can Self-Supervised Learning solve the problem of child speech recognition?
Figure 2 for Can Self-Supervised Learning solve the problem of child speech recognition?
Figure 3 for Can Self-Supervised Learning solve the problem of child speech recognition?
Viaarxiv icon

ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale

Jul 22, 2022
Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure

Figure 1 for ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Figure 2 for ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Figure 3 for ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Figure 4 for ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Viaarxiv icon