Alert button

"speech recognition": models, code, and papers
Alert button

Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer

Mar 29, 2022
Jingyu Sun, Guiping Zhong, Dinghao Zhou, Baoxiang Li

Figure 1 for Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
Figure 2 for Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
Figure 3 for Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
Figure 4 for Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
Viaarxiv icon

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

Nov 16, 2022
Zihan Wang, Qi Meng, HaiFeng Lan, XinRui Zhang, KeHao Guo, Akshat Gupta

Figure 1 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Figure 2 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Figure 3 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Figure 4 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Viaarxiv icon

Multi-Channel Transformer Transducer for Speech Recognition

Aug 30, 2021
Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo

Figure 1 for Multi-Channel Transformer Transducer for Speech Recognition
Figure 2 for Multi-Channel Transformer Transducer for Speech Recognition
Figure 3 for Multi-Channel Transformer Transducer for Speech Recognition
Figure 4 for Multi-Channel Transformer Transducer for Speech Recognition
Viaarxiv icon

Code-Switching Text Generation and Injection in Mandarin-English ASR

Mar 20, 2023
Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng

Figure 1 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Figure 2 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Figure 3 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Figure 4 for Code-Switching Text Generation and Injection in Mandarin-English ASR
Viaarxiv icon

Speech-enhanced and Noise-aware Networks for Robust Speech Recognition

Add code
Bookmark button
Alert button
Mar 25, 2022
Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang

Figure 1 for Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Figure 2 for Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Figure 3 for Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Figure 4 for Speech-enhanced and Noise-aware Networks for Robust Speech Recognition
Viaarxiv icon

Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model

Mar 13, 2023
Shuangping Huang, Yu Luo, Zhenzhou Zhuang, Jin-Gang Yu, Mengchao He, Yongpan Wang

Figure 1 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 2 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 3 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Figure 4 for Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model
Viaarxiv icon

Korean Tokenization for Beam Search Rescoring in Speech Recognition

Feb 22, 2022
Kyuhong Shim, Hyewon Bae, Wonyong Sung

Figure 1 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Figure 2 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Figure 3 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Figure 4 for Korean Tokenization for Beam Search Rescoring in Speech Recognition
Viaarxiv icon

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Oct 21, 2021
Ting-Yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

Figure 1 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Figure 2 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Figure 3 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Figure 4 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Viaarxiv icon

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

Feb 10, 2023
Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli

Figure 1 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 2 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 3 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Figure 4 for AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Viaarxiv icon

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Add code
Bookmark button
Alert button
May 16, 2023
Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng

Figure 1 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Figure 2 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Figure 3 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Figure 4 for Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion
Viaarxiv icon