speech


Designing Practical Models for Isolated Word Visual Speech Recognition

Add code
Aug 25, 2025
Viaarxiv icon

FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation

Add code
Aug 25, 2025
Viaarxiv icon

Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation

Add code
Aug 25, 2025
Viaarxiv icon

EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems

Add code
Aug 25, 2025
Viaarxiv icon

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

Add code
Aug 25, 2025
Viaarxiv icon

Vocoder-Projected Feature Discriminator

Add code
Aug 25, 2025
Viaarxiv icon

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance

Add code
Aug 25, 2025
Viaarxiv icon

Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters

Add code
Aug 25, 2025
Viaarxiv icon

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs

Add code
Aug 25, 2025
Viaarxiv icon

Dynamic Fusion Multimodal Network for SpeechWellness Detection

Add code
Aug 25, 2025
Viaarxiv icon