speech


Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech

Add code
Sep 18, 2025
Viaarxiv icon

Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens

Add code
Sep 18, 2025
Figure 1 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Figure 2 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Figure 3 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Figure 4 for Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens
Viaarxiv icon

Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

Add code
Sep 18, 2025
Viaarxiv icon

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation

Add code
Sep 18, 2025
Viaarxiv icon

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

Add code
Sep 18, 2025
Viaarxiv icon

Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Add code
Sep 18, 2025
Viaarxiv icon

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Add code
Sep 18, 2025
Viaarxiv icon

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization

Add code
Sep 18, 2025
Viaarxiv icon

Frustratingly Easy Data Augmentation for Low-Resource ASR

Add code
Sep 18, 2025
Viaarxiv icon

Emotion-Aware Speech Generation with Character-Specific Voices for Comics

Add code
Sep 18, 2025
Figure 1 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 2 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 3 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 4 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Viaarxiv icon