Alert button

"speech recognition": models, code, and papers
Alert button

Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems

Feb 29, 2024
Quentin Raymondaud, Mickael Rouvier, Richard Dufour

Viaarxiv icon

An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement

Feb 27, 2024
Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Chi-Han Lin, Berlin Chen

Viaarxiv icon

Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models

Mar 18, 2024
Linus Nwankwo, Elmar Rueckert

Viaarxiv icon

How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

Feb 20, 2024
Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

Viaarxiv icon

Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

Feb 20, 2024
David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Viaarxiv icon

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

Mar 11, 2024
Keshara Weerasinghe, Saahith Janapati, Xueren Ge, Sion Kim, Sneha Iyer, John A. Stankovic, Homa Alemzadeh

Viaarxiv icon

A Comprehensive Study of the Current State-of-the-Art in Nepali Automatic Speech Recognition Systems

Feb 05, 2024
Rupak Raj Ghimire, Bal Krishna Bal, Prakash Poudyal

Viaarxiv icon

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Feb 20, 2024
Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe

Viaarxiv icon

Speech emotion recognition from voice messages recorded in the wild

Mar 04, 2024
Lucía Gómez-Zaragozá, Óscar Valls, Rocío del Amor, María José Castro-Bleda, Valery Naranjo, Mariano Alcañiz Raya, Javier Marín-Morales

Figure 1 for Speech emotion recognition from voice messages recorded in the wild
Figure 2 for Speech emotion recognition from voice messages recorded in the wild
Figure 3 for Speech emotion recognition from voice messages recorded in the wild
Figure 4 for Speech emotion recognition from voice messages recorded in the wild
Viaarxiv icon

SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages

Mar 14, 2024
René Groh, Nina Goes, Andreas M. Kist

Viaarxiv icon