Alert button

"speech recognition": models, code, and papers
Alert button

Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

Jul 08, 2021
Huahuan Zheng, Wenjie Peng, Zhijian Ou, Jinsong Zhang

Figure 1 for Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers
Figure 2 for Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers
Figure 3 for Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers
Figure 4 for Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers
Viaarxiv icon

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

Oct 20, 2020
Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

Figure 1 for Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Figure 2 for Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Figure 3 for Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Figure 4 for Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Viaarxiv icon

Two-Pass End-to-End Speech Recognition

Aug 29, 2019
Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu

Figure 1 for Two-Pass End-to-End Speech Recognition
Figure 2 for Two-Pass End-to-End Speech Recognition
Figure 3 for Two-Pass End-to-End Speech Recognition
Figure 4 for Two-Pass End-to-End Speech Recognition
Viaarxiv icon

Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

Jul 26, 2022
Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung

Figure 1 for Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
Figure 2 for Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
Figure 3 for Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
Figure 4 for Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
Viaarxiv icon

Innovative Bert-based Reranking Language Models for Speech Recognition

Apr 11, 2021
Shih-Hsuan Chiu, Berlin Chen

Figure 1 for Innovative Bert-based Reranking Language Models for Speech Recognition
Figure 2 for Innovative Bert-based Reranking Language Models for Speech Recognition
Figure 3 for Innovative Bert-based Reranking Language Models for Speech Recognition
Figure 4 for Innovative Bert-based Reranking Language Models for Speech Recognition
Viaarxiv icon

Speech Recognition for Endangered and Extinct Samoyedic languages

Dec 09, 2020
Niko Partanen, Mika Hämäläinen, Tiina Klooster

Figure 1 for Speech Recognition for Endangered and Extinct Samoyedic languages
Figure 2 for Speech Recognition for Endangered and Extinct Samoyedic languages
Figure 3 for Speech Recognition for Endangered and Extinct Samoyedic languages
Figure 4 for Speech Recognition for Endangered and Extinct Samoyedic languages
Viaarxiv icon

Fast and parallel decoding for transducer

Oct 31, 2022
Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey

Figure 1 for Fast and parallel decoding for transducer
Figure 2 for Fast and parallel decoding for transducer
Figure 3 for Fast and parallel decoding for transducer
Figure 4 for Fast and parallel decoding for transducer
Viaarxiv icon

Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition

Oct 15, 2019
Salar Jafarlou, Soheil Khorram, Vinay Kothapally, John H. L. Hansen

Figure 1 for Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition
Figure 2 for Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition
Figure 3 for Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition
Figure 4 for Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition
Viaarxiv icon

A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition

Jul 03, 2022
Ying Hu, Yuwu Tang, Hao Huang, Liang He

Figure 1 for A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition
Figure 2 for A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition
Figure 3 for A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition
Figure 4 for A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition
Viaarxiv icon

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Jul 02, 2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux

Figure 1 for Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
Figure 2 for Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
Viaarxiv icon