Alert button

"speech recognition": models, code, and papers
Alert button

The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

Feb 12, 2024
Ajinkya Kulkarni, Anna Tokareva, Rameez Qureshi, Miguel Couceiro

Viaarxiv icon

Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition

Feb 08, 2024
Lei Liu, Li Liu, Haizhou Li

Viaarxiv icon

A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings

Feb 15, 2024
Hyewon Han, Naveen Kumar

Viaarxiv icon

Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru

Feb 18, 2024
Zining Wang, Paul Reisert, Eric Nichols, Randy Gomez

Viaarxiv icon

Unified Speech-Text Pretraining for Spoken Dialog Modeling

Feb 08, 2024
Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo

Viaarxiv icon

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Jan 11, 2024
Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

Viaarxiv icon

Significance of Chirp MFCC as a Feature in Speech and Audio Applications

Feb 19, 2024
S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

Viaarxiv icon

How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

Feb 02, 2024
Orchid Chetia Phukan, Gautam Siddharth Kashyap, Arun Balaji Buduru, Rajesh Sharma

Viaarxiv icon

Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?

Jan 10, 2024
Changye Li, Weizhe Xu, Trevor Cohen, Serguei Pakhomov

Viaarxiv icon

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

Jan 11, 2024
Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Viaarxiv icon