Picture for Ziyang Ma

Ziyang Ma

Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens

Add code
Jun 10, 2025
Viaarxiv icon

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

Add code
May 26, 2025
Viaarxiv icon

Towards Reliable Large Audio Language Model

Add code
May 25, 2025
Viaarxiv icon

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Add code
May 22, 2025
Viaarxiv icon

Towards Efficient Multi-Scale Deformable Attention on NPU

Add code
May 20, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Viaarxiv icon

Towards Flow-Matching-based TTS without Classifier-Free Guidance

Add code
Apr 29, 2025
Viaarxiv icon

Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation

Add code
Apr 27, 2025
Viaarxiv icon

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

Model Hemorrhage and the Robustness Limits of Large Language Models

Add code
Mar 31, 2025
Viaarxiv icon