Picture for Ziyang Ma

Ziyang Ma

Speech Meets ELF: Audio Conditional Continuous-Target Diffusion for Speech Recognition and Translation

Add code
Jun 09, 2026
Viaarxiv icon

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

Add code
Jun 05, 2026
Viaarxiv icon

MMAE: A Massive Multitask Audio Editing Benchmark

Add code
Jun 05, 2026
Viaarxiv icon

Audio Interaction Model

Add code
Jun 03, 2026
Viaarxiv icon

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Add code
Jun 02, 2026
Viaarxiv icon

CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems

Add code
May 28, 2026
Viaarxiv icon

Proactive for Uncertainty: Cause-Aware Error Diagnosis and Interactive Clarification for Spoken Dialogue Systems

Add code
May 25, 2026
Viaarxiv icon

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Add code
May 10, 2026
Viaarxiv icon

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Add code
May 07, 2026
Viaarxiv icon

Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse

Add code
Apr 06, 2026
Viaarxiv icon