Picture for Ziyang Ma

Ziyang Ma

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Add code
May 10, 2026
Viaarxiv icon

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Add code
May 07, 2026
Viaarxiv icon

Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse

Add code
Apr 06, 2026
Viaarxiv icon

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation

Add code
Mar 16, 2026
Viaarxiv icon

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Add code
Feb 15, 2026
Viaarxiv icon

Audio ControlNet for Fine-Grained Audio Generation and Editing

Add code
Feb 04, 2026
Viaarxiv icon

From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development

Add code
Jan 31, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

Add code
Jan 06, 2026
Viaarxiv icon

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis

Add code
Dec 21, 2025
Viaarxiv icon