speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder

Add code
Mar 09, 2026
Viaarxiv icon

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units

Add code
Mar 19, 2026
Viaarxiv icon

Acoustic and Semantic Modeling of Emotion in Spoken Language

Add code
Mar 10, 2026
Viaarxiv icon

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

Add code
Mar 11, 2026
Viaarxiv icon

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies

Add code
Mar 11, 2026
Viaarxiv icon

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Add code
Mar 12, 2026
Viaarxiv icon

NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction

Add code
Mar 11, 2026
Viaarxiv icon

The Patrologia Graeca Corpus: OCR, Annotation, and Open Release of Noisy Nineteenth-Century Polytonic Greek Editions

Add code
Mar 10, 2026
Viaarxiv icon

BROTHER: Behavioral Recognition Optimized Through Heterogeneous Ensemble Regularization for Ambivalence and Hesitancy

Add code
Mar 15, 2026
Viaarxiv icon

Team RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach

Add code
Mar 13, 2026
Viaarxiv icon