speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Scriboora: Rethinking Human Pose Forecasting

Add code
Nov 19, 2025
Figure 1 for Scriboora: Rethinking Human Pose Forecasting
Figure 2 for Scriboora: Rethinking Human Pose Forecasting
Figure 3 for Scriboora: Rethinking Human Pose Forecasting
Figure 4 for Scriboora: Rethinking Human Pose Forecasting
Viaarxiv icon

A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus

Add code
Oct 26, 2025
Figure 1 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Figure 2 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Figure 3 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Figure 4 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Viaarxiv icon

Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges

Add code
Oct 22, 2025
Viaarxiv icon

StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction

Add code
Oct 21, 2025
Viaarxiv icon

Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking

Add code
Oct 10, 2025
Figure 1 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Figure 2 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Figure 3 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Figure 4 for Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Viaarxiv icon

Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models

Add code
Oct 14, 2025
Viaarxiv icon

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Add code
Oct 26, 2025
Viaarxiv icon

MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making

Add code
Nov 10, 2025
Viaarxiv icon

Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment

Add code
Oct 23, 2025
Viaarxiv icon

A Study of the Removability of Speaker-Adversarial Perturbations

Add code
Oct 10, 2025
Figure 1 for A Study of the Removability of Speaker-Adversarial Perturbations
Figure 2 for A Study of the Removability of Speaker-Adversarial Perturbations
Figure 3 for A Study of the Removability of Speaker-Adversarial Perturbations
Figure 4 for A Study of the Removability of Speaker-Adversarial Perturbations
Viaarxiv icon