Alert button

"speech recognition": models, code, and papers
Alert button

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

Add code
Bookmark button
Alert button
Oct 30, 2020
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

Figure 1 for Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization
Figure 2 for Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization
Figure 3 for Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization
Figure 4 for Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization
Viaarxiv icon

Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project

Add code
Bookmark button
Alert button
Jun 15, 2022
Jan Lehečka, Josef V. Psutka, Josef Psutka

Figure 1 for Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Figure 2 for Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Figure 3 for Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Figure 4 for Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Viaarxiv icon

Contextual Speech Recognition with Difficult Negative Training Examples

Oct 29, 2018
Uri Alon, Golan Pundak, Tara N. Sainath

Figure 1 for Contextual Speech Recognition with Difficult Negative Training Examples
Figure 2 for Contextual Speech Recognition with Difficult Negative Training Examples
Figure 3 for Contextual Speech Recognition with Difficult Negative Training Examples
Figure 4 for Contextual Speech Recognition with Difficult Negative Training Examples
Viaarxiv icon

Serialized Output Training for End-to-End Overlapped Speech Recognition

Mar 28, 2020
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka

Figure 1 for Serialized Output Training for End-to-End Overlapped Speech Recognition
Figure 2 for Serialized Output Training for End-to-End Overlapped Speech Recognition
Figure 3 for Serialized Output Training for End-to-End Overlapped Speech Recognition
Figure 4 for Serialized Output Training for End-to-End Overlapped Speech Recognition
Viaarxiv icon

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

Oct 20, 2021
Mun-Hak Lee, Joon-Hyuk Chang

Figure 1 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Figure 2 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Figure 3 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Figure 4 for Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Viaarxiv icon

Semantic Mask for Transformer based End-to-End Speech Recognition

Add code
Bookmark button
Alert button
Dec 06, 2019
Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Figure 1 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 2 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 3 for Semantic Mask for Transformer based End-to-End Speech Recognition
Figure 4 for Semantic Mask for Transformer based End-to-End Speech Recognition
Viaarxiv icon

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Add code
Bookmark button
Alert button
Oct 18, 2021
Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang

Figure 1 for Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Figure 2 for Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Figure 3 for Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Viaarxiv icon

Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

Add code
Bookmark button
Alert button
Oct 26, 2022
Colin Leong, Joshua Nemecek, Jacob Mansdorfer, Anna Filighera, Abraham Owodunni, Daniel Whitenack

Figure 1 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Figure 2 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Figure 3 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Figure 4 for Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Viaarxiv icon

When Is TTS Augmentation Through a Pivot Language Useful?

Add code
Bookmark button
Alert button
Jul 20, 2022
Nathaniel Robinson, Perez Ogayo, Swetha Gangu, David R. Mortensen, Shinji Watanabe

Figure 1 for When Is TTS Augmentation Through a Pivot Language Useful?
Figure 2 for When Is TTS Augmentation Through a Pivot Language Useful?
Figure 3 for When Is TTS Augmentation Through a Pivot Language Useful?
Figure 4 for When Is TTS Augmentation Through a Pivot Language Useful?
Viaarxiv icon

End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection

Add code
Bookmark button
Alert button
Feb 14, 2020
Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, Shinji Watanabe

Figure 1 for End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection
Figure 2 for End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection
Figure 3 for End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection
Figure 4 for End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection
Viaarxiv icon