Picture for Zhiyong Wu

Zhiyong Wu

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

Add code
Aug 07, 2025
Viaarxiv icon

Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation

Add code
Aug 07, 2025
Viaarxiv icon

A Multi-Stage Framework for Multimodal Controllable Speech Synthesis

Add code
Jun 26, 2025
Viaarxiv icon

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Add code
Jun 09, 2025
Figure 1 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Figure 2 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Figure 3 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Figure 4 for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Viaarxiv icon

"In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion

Add code
Jun 08, 2025
Figure 1 for "In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion
Figure 2 for "In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion
Figure 3 for "In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion
Figure 4 for "In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion
Viaarxiv icon

WAKE: Watermarking Audio with Key Enrichment

Add code
Jun 06, 2025
Figure 1 for WAKE: Watermarking Audio with Key Enrichment
Figure 2 for WAKE: Watermarking Audio with Key Enrichment
Figure 3 for WAKE: Watermarking Audio with Key Enrichment
Figure 4 for WAKE: Watermarking Audio with Key Enrichment
Viaarxiv icon

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents

Add code
May 27, 2025
Figure 1 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Figure 2 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Figure 3 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Figure 4 for VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Figure 1 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 2 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 3 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 4 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Viaarxiv icon

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving

Add code
May 24, 2025
Viaarxiv icon

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding

Add code
May 21, 2025
Viaarxiv icon