speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization

Add code
Sep 19, 2025
Viaarxiv icon

Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

Add code
Sep 18, 2025
Viaarxiv icon

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization

Add code
Sep 18, 2025
Viaarxiv icon

Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model

Add code
Sep 10, 2025
Figure 1 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 2 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 3 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 4 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Viaarxiv icon

Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Untowered Airspace

Add code
Sep 17, 2025
Viaarxiv icon

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

Add code
Sep 17, 2025
Figure 1 for Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Figure 2 for Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Figure 3 for Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Figure 4 for Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Viaarxiv icon

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Add code
Sep 11, 2025
Viaarxiv icon

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

Add code
Sep 17, 2025
Viaarxiv icon

Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition

Add code
Sep 10, 2025
Viaarxiv icon

Towards Improved Speech Recognition through Optimized Synthetic Data Generation

Add code
Aug 29, 2025
Viaarxiv icon