speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Evaluating the Representation of Vowels in Wav2Vec Feature Extractor: A Layer-Wise Analysis Using MFCCs

Add code
Aug 25, 2025
Viaarxiv icon

Joint decoding method for controllable contextual speech recognition based on Speech LLM

Add code
Aug 12, 2025
Figure 1 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 2 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 3 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Figure 4 for Joint decoding method for controllable contextual speech recognition based on Speech LLM
Viaarxiv icon

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Add code
Aug 27, 2025
Viaarxiv icon

Talking to Robots: A Practical Examination of Speech Foundation Models for HRI Applications

Add code
Aug 25, 2025
Figure 1 for Talking to Robots: A Practical Examination of Speech Foundation Models for HRI Applications
Viaarxiv icon

Generative Annotation for ASR Named Entity Correction

Add code
Aug 28, 2025
Figure 1 for Generative Annotation for ASR Named Entity Correction
Figure 2 for Generative Annotation for ASR Named Entity Correction
Figure 3 for Generative Annotation for ASR Named Entity Correction
Figure 4 for Generative Annotation for ASR Named Entity Correction
Viaarxiv icon

NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration

Add code
Aug 29, 2025
Viaarxiv icon

Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System

Add code
Aug 26, 2025
Viaarxiv icon

Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children's Speech?

Add code
Aug 28, 2025
Viaarxiv icon

NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation

Add code
Sep 04, 2025
Figure 1 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 2 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 3 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 4 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Viaarxiv icon

Fairness of Automatic Speech Recognition: Looking Through a Philosophical Lens

Add code
Aug 13, 2025
Figure 1 for Fairness of Automatic Speech Recognition: Looking Through a Philosophical Lens
Viaarxiv icon