Picture for Zhiyong Wu

Zhiyong Wu

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Add code
Jun 09, 2025
Viaarxiv icon

"In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion

Add code
Jun 08, 2025
Viaarxiv icon

WAKE: Watermarking Audio with Key Enrichment

Add code
Jun 06, 2025
Viaarxiv icon

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents

Add code
May 27, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Viaarxiv icon

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving

Add code
May 24, 2025
Viaarxiv icon

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding

Add code
May 21, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis

Add code
Apr 14, 2025
Viaarxiv icon

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Add code
Apr 11, 2025
Viaarxiv icon