Alert button

"speech recognition": models, code, and papers
Alert button

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

Apr 03, 2023
Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann

Figure 1 for Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Figure 2 for Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Figure 3 for Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Figure 4 for Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Viaarxiv icon

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Feb 28, 2023
Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

Figure 1 for Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Figure 2 for Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Figure 3 for Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Figure 4 for Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Viaarxiv icon

Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

Jul 24, 2023
Muhammad Danyal Khan, Raheem Ali, Arshad Aziz

Viaarxiv icon

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition

Apr 24, 2023
Mohan Li, Rama Doddipatla, Catalin Zorila

Figure 1 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 2 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 3 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Figure 4 for Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Viaarxiv icon

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

Jul 19, 2023
Javad Peymanfard, Vahid Saeedi, Mohammad Reza Mohammadi, Hossein Zeinali, Nasser Mozayani

Figure 1 for Leveraging Visemes for Better Visual Speech Representation and Lip Reading
Figure 2 for Leveraging Visemes for Better Visual Speech Representation and Lip Reading
Figure 3 for Leveraging Visemes for Better Visual Speech Representation and Lip Reading
Viaarxiv icon

Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

Mar 23, 2023
Kai Liu, Hailiang Xiong, Gangqiang Yang, Zhengfeng Du, Yewen Cao, Danyal Shah

Figure 1 for Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Figure 2 for Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Figure 3 for Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Figure 4 for Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Viaarxiv icon

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

Add code
Bookmark button
Alert button
Sep 11, 2023
Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai

Figure 1 for GRASS: Unified Generation Model for Speech-to-Semantic Tasks
Figure 2 for GRASS: Unified Generation Model for Speech-to-Semantic Tasks
Figure 3 for GRASS: Unified Generation Model for Speech-to-Semantic Tasks
Viaarxiv icon

Injecting Categorical Labels and Syntactic Information into Biomedical NER

Nov 06, 2023
Sumam Francis, Marie-Francine Moens

Figure 1 for Injecting Categorical Labels and Syntactic Information into Biomedical NER
Figure 2 for Injecting Categorical Labels and Syntactic Information into Biomedical NER
Figure 3 for Injecting Categorical Labels and Syntactic Information into Biomedical NER
Viaarxiv icon

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

Add code
Bookmark button
Alert button
Mar 05, 2023
Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng

Figure 1 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Figure 2 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Figure 3 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Figure 4 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Viaarxiv icon

End-to-End Speech Recognition: A Survey

Add code
Bookmark button
Alert button
Mar 03, 2023
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe

Figure 1 for End-to-End Speech Recognition: A Survey
Figure 2 for End-to-End Speech Recognition: A Survey
Figure 3 for End-to-End Speech Recognition: A Survey
Figure 4 for End-to-End Speech Recognition: A Survey
Viaarxiv icon