Picture for Zhiyong Wu

Zhiyong Wu

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon

Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation

Add code
Aug 07, 2025
Viaarxiv icon

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

Add code
Aug 07, 2025
Viaarxiv icon

A Multi-Stage Framework for Multimodal Controllable Speech Synthesis

Add code
Jun 26, 2025
Viaarxiv icon

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Add code
Jun 09, 2025
Viaarxiv icon

"In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion

Add code
Jun 08, 2025
Viaarxiv icon

WAKE: Watermarking Audio with Key Enrichment

Add code
Jun 06, 2025
Viaarxiv icon

VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents

Add code
May 27, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Viaarxiv icon

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving

Add code
May 24, 2025
Viaarxiv icon