speech


Discrete optimal transport is a strong audio adversarial attack

Add code
Sep 18, 2025
Viaarxiv icon

Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech

Add code
Sep 18, 2025
Figure 1 for Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech
Figure 2 for Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech
Figure 3 for Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech
Figure 4 for Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech
Viaarxiv icon

BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition

Add code
Sep 18, 2025
Viaarxiv icon

Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech

Add code
Sep 18, 2025
Viaarxiv icon

Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens

Add code
Sep 18, 2025
Figure 1 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Figure 2 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Figure 3 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Figure 4 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Viaarxiv icon

Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

Add code
Sep 18, 2025
Viaarxiv icon

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation

Add code
Sep 18, 2025
Viaarxiv icon

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

Add code
Sep 18, 2025
Viaarxiv icon

Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Add code
Sep 18, 2025
Viaarxiv icon

Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection

Add code
Sep 17, 2025
Viaarxiv icon