speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Add code
May 26, 2025
Viaarxiv icon

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing

Add code
May 22, 2025
Viaarxiv icon

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement

Add code
May 07, 2025
Viaarxiv icon

Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech

Add code
May 06, 2025
Viaarxiv icon

Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations

Add code
May 08, 2025
Viaarxiv icon

Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer

Add code
May 23, 2025
Viaarxiv icon

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

Add code
May 07, 2025
Viaarxiv icon

Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients

Add code
May 09, 2025
Viaarxiv icon

CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization

Add code
May 06, 2025
Viaarxiv icon

Empirical Analysis of Asynchronous Federated Learning on Heterogeneous Devices: Efficiency, Fairness, and Privacy Trade-offs

Add code
May 11, 2025
Viaarxiv icon