Alert button

"speech recognition": models, code, and papers
Alert button

Capturing Multi-Resolution Context by Dilated Self-Attention

Apr 07, 2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux

Figure 1 for Capturing Multi-Resolution Context by Dilated Self-Attention
Figure 2 for Capturing Multi-Resolution Context by Dilated Self-Attention
Figure 3 for Capturing Multi-Resolution Context by Dilated Self-Attention
Viaarxiv icon

An Effective End-to-End Modeling Approach for Mispronunciation Detection

May 18, 2020
Tien-Hong Lo, Shi-Yan Weng, Hsiu-Jui Chang, Berlin Chen

Figure 1 for An Effective End-to-End Modeling Approach for Mispronunciation Detection
Figure 2 for An Effective End-to-End Modeling Approach for Mispronunciation Detection
Figure 3 for An Effective End-to-End Modeling Approach for Mispronunciation Detection
Viaarxiv icon

Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

Apr 06, 2021
Jumon Nozaki, Tatsuya Komatsu

Figure 1 for Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Figure 2 for Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Figure 3 for Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Figure 4 for Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Viaarxiv icon

Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech

Jan 21, 2021
Takuya Fujimura, Yuma Koizumi, Kohei Yatabe, Ryoichi Miyazaki

Figure 1 for Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech
Figure 2 for Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech
Figure 3 for Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech
Figure 4 for Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech
Viaarxiv icon

Extremely Low Footprint End-to-End ASR System for Smart Device

Apr 06, 2021
Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin

Figure 1 for Extremely Low Footprint End-to-End ASR System for Smart Device
Figure 2 for Extremely Low Footprint End-to-End ASR System for Smart Device
Figure 3 for Extremely Low Footprint End-to-End ASR System for Smart Device
Figure 4 for Extremely Low Footprint End-to-End ASR System for Smart Device
Viaarxiv icon

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

Apr 06, 2021
Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer

Figure 1 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Figure 2 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Figure 3 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Figure 4 for Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Viaarxiv icon

Dual Script E2E framework for Multilingual and Code-Switching ASR

Jun 02, 2021
Mari Ganesh Kumar, Jom Kuriakose, Anand Thyagachandran, Arun Kumar A, Ashish Seth, Lodagala Durga Prasad, Saish Jaiswal, Anusha Prakash, Hema Murthy

Figure 1 for Dual Script E2E framework for Multilingual and Code-Switching ASR
Figure 2 for Dual Script E2E framework for Multilingual and Code-Switching ASR
Figure 3 for Dual Script E2E framework for Multilingual and Code-Switching ASR
Figure 4 for Dual Script E2E framework for Multilingual and Code-Switching ASR
Viaarxiv icon

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

Add code
Bookmark button
Alert button
Apr 04, 2021
Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David Mortensen, Michael R. Marlo, Graham Neubig

Figure 1 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Figure 2 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Figure 3 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Figure 4 for Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Viaarxiv icon

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

Jun 15, 2022
Jingyu Li, Yusheng Tian, Tan Lee

Figure 1 for Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification
Figure 2 for Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification
Figure 3 for Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification
Figure 4 for Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification
Viaarxiv icon

TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling

Apr 04, 2021
Tze Yuang Chong, Xuyang Wang, Lin Yang, Junjie Wang

Figure 1 for TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling
Figure 2 for TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling
Figure 3 for TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling
Figure 4 for TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling
Viaarxiv icon