speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

SpectroFusion-ViT: A Lightweight Transformer for Speech Emotion Recognition Using Harmonic Mel-Chroma Fusion

Add code
Feb 28, 2026
Viaarxiv icon

The USTC-NERCSLIP Systems for the CHiME-9 MCoRec Challenge

Add code
Mar 02, 2026
Viaarxiv icon

Acoustic and Semantic Modeling of Emotion in Spoken Language

Add code
Mar 10, 2026
Viaarxiv icon

Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition

Add code
Feb 22, 2026
Viaarxiv icon

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

Add code
Mar 11, 2026
Viaarxiv icon

End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation

Add code
Mar 02, 2026
Viaarxiv icon

Polynomial Mixing for Efficient Self-supervised Speech Encoders

Add code
Feb 28, 2026
Viaarxiv icon

Dialect and Gender Bias in YouTube's Spanish Captioning System

Add code
Feb 27, 2026
Viaarxiv icon

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Add code
Feb 27, 2026
Viaarxiv icon

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Add code
Mar 12, 2026
Viaarxiv icon