Alert button

"speech recognition": models, code, and papers
Alert button

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Bookmark button
Alert button
Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Fast Word Error Rate Estimation Using Self-Supervised Representations For Speech And Text

Oct 12, 2023
Chanho Park, Chengsong Lu, Mingjie Chen, Thomas Hain

Viaarxiv icon

SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

Add code
Bookmark button
Alert button
Jun 21, 2023
Changhun Kim, Joonhyung Park, Hajin Shim, Eunho Yang

Figure 1 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Figure 2 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Figure 3 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Figure 4 for SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Viaarxiv icon

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Add code
Bookmark button
Alert button
Mar 01, 2023
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang

Figure 1 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 2 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 3 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Figure 4 for MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Viaarxiv icon

A large-scale multimodal dataset of human speech recognition

Add code
Bookmark button
Alert button
Mar 15, 2023
Yao Ge, Chong Tang, Haobo Li, Zikang Zhang, Wenda Li, Kevin Chetty, Daniele Faccio, Qammer H. Abbasi, Muhammad Imran

Figure 1 for A large-scale multimodal dataset of human speech recognition
Figure 2 for A large-scale multimodal dataset of human speech recognition
Figure 3 for A large-scale multimodal dataset of human speech recognition
Figure 4 for A large-scale multimodal dataset of human speech recognition
Viaarxiv icon

Segmentation-Free Streaming Machine Translation

Sep 26, 2023
Javier Iranzo-Sánchez, Jorge Iranzo-Sánchez, Adrià Giménez, Jorge Civera, Alfons Juan

Viaarxiv icon

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference

Oct 01, 2023
Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe

Viaarxiv icon

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Add code
Bookmark button
Alert button
May 11, 2023
Dima Rekesh, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Ankur Kumar, Boris Ginsburg

Figure 1 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Figure 2 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Figure 3 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Figure 4 for Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Viaarxiv icon

Considerations for Ethical Speech Recognition Datasets

May 03, 2023
Orestis Papakyriakopoulos, Alice Xiang

Viaarxiv icon

An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples

Add code
Bookmark button
Alert button
Oct 05, 2023
Armin Ettenhofer, Jan-Philipp Schulze, Karla Pizzi

Viaarxiv icon