Alert button

"speech recognition": models, code, and papers
Alert button

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Jul 06, 2022
Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung

Figure 1 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 2 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 3 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 4 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Viaarxiv icon

Bengali Common Voice Speech Dataset for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Jun 29, 2022
Samiul Alam, Asif Sushmit, Zaowad Abdullah, Shahrin Nakkhatra, MD. Nazmuddoha Ansary, Syed Mobassir Hossen, Sazia Morshed Mehnaz, Tahsin Reasat, Ahmed Imtiaz Humayun

Figure 1 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Figure 2 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Figure 3 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Figure 4 for Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Viaarxiv icon

DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization

Aug 04, 2023
Wang Lu, Jindong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, Xing Xie

Figure 1 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Figure 2 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Figure 3 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Figure 4 for DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization
Viaarxiv icon

Automatic Speech recognition for Speech Assessment of Preschool Children

Add code
Bookmark button
Alert button
Mar 24, 2022
Amirhossein Abaskohi, Fatemeh Mortazavi, Hadi Moradi

Figure 1 for Automatic Speech recognition for Speech Assessment of Preschool Children
Figure 2 for Automatic Speech recognition for Speech Assessment of Preschool Children
Figure 3 for Automatic Speech recognition for Speech Assessment of Preschool Children
Figure 4 for Automatic Speech recognition for Speech Assessment of Preschool Children
Viaarxiv icon

Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

Jul 22, 2022
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

Figure 1 for Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
Figure 2 for Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
Figure 3 for Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
Figure 4 for Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
Viaarxiv icon

Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

Add code
Bookmark button
Alert button
Feb 21, 2023
Leyuan Qu, Cornelius Weber, Stefan Wermter

Figure 1 for Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Figure 2 for Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Figure 3 for Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Figure 4 for Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Viaarxiv icon

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Add code
Bookmark button
Alert button
Sep 03, 2023
Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang

Figure 1 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 2 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 3 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 4 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Viaarxiv icon

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

Sep 05, 2022
Jiadong Wang, Xinyuan Qian, Haizhou Li

Figure 1 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 2 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 3 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 4 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Viaarxiv icon

Can Generative Large Language Models Perform ASR Error Correction?

Jul 09, 2023
Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill

Figure 1 for Can Generative Large Language Models Perform ASR Error Correction?
Figure 2 for Can Generative Large Language Models Perform ASR Error Correction?
Figure 3 for Can Generative Large Language Models Perform ASR Error Correction?
Figure 4 for Can Generative Large Language Models Perform ASR Error Correction?
Viaarxiv icon

Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion

Add code
Bookmark button
Alert button
Jul 01, 2023
Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro

Figure 1 for Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Figure 2 for Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Figure 3 for Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Figure 4 for Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Viaarxiv icon