Alert button

"speech recognition": models, code, and papers
Alert button

Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition

Aug 09, 2021
Arash Dehghani, Seyyed Ali Seyyedsalehi

Figure 1 for Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
Figure 2 for Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
Figure 3 for Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
Figure 4 for Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
Viaarxiv icon

Environmental Noise Embeddings for Robust Speech Recognition

Sep 29, 2016
Suyoun Kim, Bhiksha Raj, Ian Lane

Figure 1 for Environmental Noise Embeddings for Robust Speech Recognition
Figure 2 for Environmental Noise Embeddings for Robust Speech Recognition
Figure 3 for Environmental Noise Embeddings for Robust Speech Recognition
Figure 4 for Environmental Noise Embeddings for Robust Speech Recognition
Viaarxiv icon

Is Lip Region-of-Interest Sufficient for Lipreading?

Add code
Bookmark button
Alert button
May 28, 2022
Jing-Xuan Zhang, Gen-Shun Wan, Jia Pan

Figure 1 for Is Lip Region-of-Interest Sufficient for Lipreading?
Figure 2 for Is Lip Region-of-Interest Sufficient for Lipreading?
Figure 3 for Is Lip Region-of-Interest Sufficient for Lipreading?
Figure 4 for Is Lip Region-of-Interest Sufficient for Lipreading?
Viaarxiv icon

A Wavelet Transform Based Scheme to Extract Speech Pitch and Formant Frequencies

Sep 07, 2022
Seyedamiryousef Hosseini Goki, Mahdieh Ghazvini, Sajad Hamzenejadi

Figure 1 for A Wavelet Transform Based Scheme to Extract Speech Pitch and Formant Frequencies
Figure 2 for A Wavelet Transform Based Scheme to Extract Speech Pitch and Formant Frequencies
Figure 3 for A Wavelet Transform Based Scheme to Extract Speech Pitch and Formant Frequencies
Figure 4 for A Wavelet Transform Based Scheme to Extract Speech Pitch and Formant Frequencies
Viaarxiv icon

Thai Wav2Vec2.0 with CommonVoice V8

Add code
Bookmark button
Alert button
Aug 09, 2022
Wannaphong Phatthiyaphaibun, Chompakorn Chaksangchaichot, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong

Figure 1 for Thai Wav2Vec2.0 with CommonVoice V8
Figure 2 for Thai Wav2Vec2.0 with CommonVoice V8
Viaarxiv icon

A comparison of end-to-end models for long-form speech recognition

Nov 06, 2019
Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu

Figure 1 for A comparison of end-to-end models for long-form speech recognition
Figure 2 for A comparison of end-to-end models for long-form speech recognition
Figure 3 for A comparison of end-to-end models for long-form speech recognition
Viaarxiv icon

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Jun 03, 2021
Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

Figure 1 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 2 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 3 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 4 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Viaarxiv icon

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Oct 18, 2022
Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen

Figure 1 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 2 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 3 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 4 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Viaarxiv icon

Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation

Add code
Bookmark button
Alert button
Oct 18, 2022
Chen Wang, Yuchen Liu, Boxing Chen, Jiajun Zhang, Wei Luo, Zhongqiang Huang, Chengqing Zong

Figure 1 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Figure 2 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Figure 3 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Figure 4 for Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Viaarxiv icon

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

May 30, 2020
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Figure 1 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Figure 2 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Figure 3 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Figure 4 for Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Viaarxiv icon