Alert button

"speech": models, code, and papers
Alert button

Generalization Ability of MOS Prediction Networks

Add code
Bookmark button
Alert button
Oct 06, 2021
Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi

Figure 1 for Generalization Ability of MOS Prediction Networks
Figure 2 for Generalization Ability of MOS Prediction Networks
Figure 3 for Generalization Ability of MOS Prediction Networks
Figure 4 for Generalization Ability of MOS Prediction Networks
Viaarxiv icon

A scalable noisy speech dataset and online subjective test framework

Sep 17, 2019
Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke

Figure 1 for A scalable noisy speech dataset and online subjective test framework
Figure 2 for A scalable noisy speech dataset and online subjective test framework
Figure 3 for A scalable noisy speech dataset and online subjective test framework
Viaarxiv icon

Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling

Add code
Bookmark button
Alert button
Apr 13, 2021
Genta Indra Winata

Figure 1 for Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling
Figure 2 for Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling
Figure 3 for Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling
Figure 4 for Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling
Viaarxiv icon

Punctuation Restoration

Add code
Bookmark button
Alert button
Feb 19, 2022
Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, Thien Huu Nguyen

Figure 1 for Punctuation Restoration
Figure 2 for Punctuation Restoration
Figure 3 for Punctuation Restoration
Figure 4 for Punctuation Restoration
Viaarxiv icon

Towards Learning Universal Audio Representations

Add code
Bookmark button
Alert button
Dec 01, 2021
Luyu Wang, Pauline Luc, Yan Wu, Adria Recasens, Lucas Smaira, Andrew Brock, Andrew Jaegle, Jean-Baptiste Alayrac, Sander Dieleman, Joao Carreira, Aaron van den Oord

Figure 1 for Towards Learning Universal Audio Representations
Figure 2 for Towards Learning Universal Audio Representations
Figure 3 for Towards Learning Universal Audio Representations
Figure 4 for Towards Learning Universal Audio Representations
Viaarxiv icon

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

Add code
Bookmark button
Alert button
Apr 17, 2020
George Sterpu, Christian Saam, Naomi Harte

Figure 1 for How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
Figure 2 for How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
Figure 3 for How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
Figure 4 for How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
Viaarxiv icon

Population Based Training for Data Augmentation and Regularization in Speech Recognition

Oct 08, 2020
Daniel Haziza, Jérémy Rapin, Gabriel Synnaeve

Figure 1 for Population Based Training for Data Augmentation and Regularization in Speech Recognition
Figure 2 for Population Based Training for Data Augmentation and Regularization in Speech Recognition
Figure 3 for Population Based Training for Data Augmentation and Regularization in Speech Recognition
Figure 4 for Population Based Training for Data Augmentation and Regularization in Speech Recognition
Viaarxiv icon

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

Apr 01, 2022
Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

Figure 1 for InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
Figure 2 for InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
Figure 3 for InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
Figure 4 for InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR
Viaarxiv icon

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

Add code
Bookmark button
Alert button
Apr 03, 2019
Kyubyong Park, Thomas Mulc

Figure 1 for CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Figure 2 for CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Figure 3 for CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Figure 4 for CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Viaarxiv icon

Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

Oct 24, 2020
Ethan A. Chi, Julian Salazar, Katrin Kirchhoff

Figure 1 for Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Figure 2 for Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Figure 3 for Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Figure 4 for Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Viaarxiv icon