Alert button

"speech recognition": models, code, and papers
Alert button

A large-scale multimodal dataset of human speech recognition

Add code
Bookmark button
Alert button
Mar 15, 2023
Yao Ge, Chong Tang, Haobo Li, Zikang Zhang, Wenda Li, Kevin Chetty, Daniele Faccio, Qammer H. Abbasi, Muhammad Imran

Figure 1 for A large-scale multimodal dataset of human speech recognition
Figure 2 for A large-scale multimodal dataset of human speech recognition
Figure 3 for A large-scale multimodal dataset of human speech recognition
Figure 4 for A large-scale multimodal dataset of human speech recognition
Viaarxiv icon

Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency

Jun 07, 2023
Shigeki Karita, Richard Sproat, Haruko Ishikawa

Figure 1 for Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency
Figure 2 for Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency
Figure 3 for Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency
Viaarxiv icon

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Oct 01, 2023
Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

Figure 1 for Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Figure 2 for Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Viaarxiv icon

LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Add code
Bookmark button
Alert button
Sep 28, 2023
Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu

Figure 1 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 2 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 3 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 4 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Viaarxiv icon

Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR

Sep 28, 2023
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Viaarxiv icon

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Add code
Bookmark button
Alert button
Mar 01, 2023
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang

Figure 1 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 2 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 3 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 4 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Viaarxiv icon

On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

May 25, 2023
Lokesh Bansal, S. Pavankumar Dubagunta, Malolan Chetlur, Pushpak Jagtap, Aravind Ganapathiraju

Figure 1 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Figure 2 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Figure 3 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Viaarxiv icon

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model

Oct 06, 2023
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe

Figure 1 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 2 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 3 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Figure 4 for HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Viaarxiv icon

Considerations for Ethical Speech Recognition Datasets

May 03, 2023
Orestis Papakyriakopoulos, Alice Xiang

Viaarxiv icon

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Jun 20, 2023
Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei

Figure 1 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 2 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 3 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 4 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Viaarxiv icon