Alert button

"speech": models, code, and papers
Alert button

Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

Jun 18, 2021
Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

Figure 1 for Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization
Figure 2 for Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization
Figure 3 for Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization
Figure 4 for Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization
Viaarxiv icon

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

Add code
Bookmark button
Alert button
Dec 31, 2020
Wei-Ning Hsu, David Harwath, Christopher Song, James Glass

Figure 1 for Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Figure 2 for Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Figure 3 for Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Figure 4 for Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Viaarxiv icon

Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis

Add code
Bookmark button
Alert button
May 17, 2021
Erica Cooper, Xin Wang, Junichi Yamagishi

Figure 1 for Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Figure 2 for Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Figure 3 for Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Figure 4 for Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Viaarxiv icon

Discriminative Multi-modality Speech Recognition

Add code
Bookmark button
Alert button
May 13, 2020
Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang

Figure 1 for Discriminative Multi-modality Speech Recognition
Figure 2 for Discriminative Multi-modality Speech Recognition
Figure 3 for Discriminative Multi-modality Speech Recognition
Figure 4 for Discriminative Multi-modality Speech Recognition
Viaarxiv icon

Enabling On-Device Training of Speech Recognition Models with Federated Dropout

Oct 07, 2021
Dhruv Guliani, Lillian Zhou, Changwan Ryu, Tien-Ju Yang, Harry Zhang, Yonghui Xiao, Francoise Beaufays, Giovanni Motta

Figure 1 for Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Figure 2 for Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Figure 3 for Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Figure 4 for Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Viaarxiv icon

Prosodic Alignment for off-screen automatic dubbing

Apr 06, 2022
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote

Figure 1 for Prosodic Alignment for off-screen automatic dubbing
Figure 2 for Prosodic Alignment for off-screen automatic dubbing
Figure 3 for Prosodic Alignment for off-screen automatic dubbing
Figure 4 for Prosodic Alignment for off-screen automatic dubbing
Viaarxiv icon

Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications

Jan 14, 2021
Yoo Rhee Oh, Kiyoung Park, Jeon Gyu Park

Figure 1 for Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications
Figure 2 for Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications
Figure 3 for Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications
Viaarxiv icon

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

Feb 07, 2021
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

Figure 1 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 2 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 3 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Figure 4 for Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Viaarxiv icon

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

Add code
Bookmark button
Alert button
Apr 03, 2021
Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li, Daniel Povey, Yujun Wang

Figure 1 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 2 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 3 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Figure 4 for speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Viaarxiv icon

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

Aug 27, 2022
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu

Figure 1 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 2 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 3 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 4 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Viaarxiv icon