Picture for Zhiyong Wu

Zhiyong Wu

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Add code
Sep 10, 2025
Viaarxiv icon

VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents

Add code
Sep 04, 2025
Viaarxiv icon

Human Motion Video Generation: A Survey

Add code
Sep 04, 2025
Viaarxiv icon

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon

Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation

Add code
Aug 07, 2025
Viaarxiv icon

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

Add code
Aug 07, 2025
Viaarxiv icon

A Multi-Stage Framework for Multimodal Controllable Speech Synthesis

Add code
Jun 26, 2025
Viaarxiv icon

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Add code
Jun 09, 2025
Viaarxiv icon

"In This Environment, As That Speaker": A Text-Driven Framework for Multi-Attribute Speech Conversion

Add code
Jun 08, 2025
Viaarxiv icon

WAKE: Watermarking Audio with Key Enrichment

Add code
Jun 06, 2025
Viaarxiv icon