Alert button

"speech recognition": models, code, and papers
Alert button

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

May 30, 2023
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang Zhang

Figure 1 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 2 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 3 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 4 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Viaarxiv icon

Agricultural Robotic System: The Automation of Detection and Speech Control

Jul 19, 2023
Yang Wenkai, Ji Ruihang, Yue Yiran, Gu Zhonghan, Shu Wanyang, Sam Ge Shuzhi

Figure 1 for Agricultural Robotic System: The Automation of Detection and Speech Control
Figure 2 for Agricultural Robotic System: The Automation of Detection and Speech Control
Figure 3 for Agricultural Robotic System: The Automation of Detection and Speech Control
Figure 4 for Agricultural Robotic System: The Automation of Detection and Speech Control
Viaarxiv icon

MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition

Aug 08, 2023
Yu Pan

Figure 1 for MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition
Figure 2 for MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition
Figure 3 for MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition
Figure 4 for MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition
Viaarxiv icon

Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

May 23, 2023
Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima

Figure 1 for Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
Figure 2 for Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
Figure 3 for Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
Figure 4 for Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
Viaarxiv icon

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

Sep 19, 2023
Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Figure 1 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
Figure 2 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
Figure 3 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
Figure 4 for Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
Viaarxiv icon

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

Jun 09, 2023
Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma

Figure 1 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 2 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 3 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 4 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Viaarxiv icon

AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning

Sep 04, 2023
Yi-Cheng Wang, Tzu-Ting Yang, Hsin-Wei Wang, Bi-Cheng Yan, Berlin Chen

Figure 1 for AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning
Figure 2 for AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning
Figure 3 for AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning
Figure 4 for AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning
Viaarxiv icon

TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition

May 23, 2023
Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu

Figure 1 for TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition
Figure 2 for TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition
Figure 3 for TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition
Figure 4 for TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition
Viaarxiv icon

Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks

Aug 30, 2023
Payal Mohapatra, Akash Pandey, Yueyuan Sui, Qi Zhu

Viaarxiv icon

OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

May 15, 2023
Fazle Rabbi Rakib, Souhardya Saha Dip, Samiul Alam, Nazia Tasnim, Md. Istiak Hossain Shihab, Md. Nazmuddoha Ansary, Syed Mobassir Hossen, Marsia Haque Meghla, Mamunur Mamun, Farig Sadeque, Sayma Sultana Chowdhury, Tahsin Reasat, Asif Sushmit, Ahmed Imtiaz Humayun

Figure 1 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Figure 2 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Figure 3 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Figure 4 for OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Viaarxiv icon