Alert button

"speech recognition": models, code, and papers
Alert button

Multilingual Speech Models for Automatic Speech Recognition Exhibit Gender Performance Gaps

Feb 28, 2024
Giuseppe Attanasio, Beatrice Savoldi, Dennis Fucci, Dirk Hovy

Viaarxiv icon

Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview

Mar 01, 2024
Heyang Liu, Yu Wang, Yanfeng Wang

Figure 1 for Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Figure 2 for Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Figure 3 for Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Figure 4 for Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Viaarxiv icon

Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation

Mar 19, 2024
Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari

Viaarxiv icon

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Mar 21, 2024
Dominik Wager, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

Viaarxiv icon

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Mar 21, 2024
Dongyeong Hwang, Hyunju Kim, Sunwoo Kim, Kijung Shin

Viaarxiv icon

Open Access NAO (OAN): a ROS2-based software framework for HRI applications with the NAO robot

Mar 20, 2024
Antonio Bono, Kenji Brameld, Luigi D'Alfonso, Giuseppe Fedele

Viaarxiv icon

Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR

Mar 16, 2024
Savitha Murthy, Dinkar Sitaram

Viaarxiv icon

Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism

Mar 07, 2024
Xiaoyu Tang, Yixin Lin, Ting Dang, Yuanfang Zhang, Jintao Cheng

Figure 1 for Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism
Figure 2 for Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism
Figure 3 for Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism
Figure 4 for Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism
Viaarxiv icon

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Mar 21, 2024
Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Viaarxiv icon

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Mar 02, 2024
Tyler Benster, Guy Wilson, Reshef Elisha, Francis R Willett, Shaul Druckmann

Viaarxiv icon