Picture for Ziyang Ma

Ziyang Ma

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

Model Hemorrhage and the Robustness Limits of Large Language Models

Add code
Mar 31, 2025
Viaarxiv icon

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Add code
Mar 11, 2025
Viaarxiv icon

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Add code
Mar 03, 2025
Viaarxiv icon

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

Add code
Feb 25, 2025
Viaarxiv icon

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model

Add code
Jan 13, 2025
Figure 1 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Figure 2 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Figure 3 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Figure 4 for Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Viaarxiv icon

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

Add code
Jan 03, 2025
Viaarxiv icon

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Add code
Dec 23, 2024
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Viaarxiv icon

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization

Add code
Dec 13, 2024
Figure 1 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 2 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 3 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Figure 4 for VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization
Viaarxiv icon