speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects

Add code
May 26, 2025
Viaarxiv icon

Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio

Add code
May 15, 2025
Viaarxiv icon

SEED: Speaker Embedding Enhancement Diffusion Model

Add code
May 22, 2025
Viaarxiv icon

Fairness of Automatic Speech Recognition in Cleft Lip and Palate Speech

Add code
May 06, 2025
Viaarxiv icon

AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection

Add code
May 26, 2025
Viaarxiv icon

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing

Add code
May 22, 2025
Viaarxiv icon

Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations

Add code
May 08, 2025
Viaarxiv icon

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Add code
May 26, 2025
Viaarxiv icon

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

Add code
May 07, 2025
Viaarxiv icon

Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments

Add code
May 02, 2025
Viaarxiv icon