Picture for Zhiyong Wu

Zhiyong Wu

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction

Add code
Jan 06, 2026
Viaarxiv icon

From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing

Add code
Dec 31, 2025
Viaarxiv icon

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Add code
Nov 10, 2025
Figure 1 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 2 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 3 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 4 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Viaarxiv icon

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Add code
Sep 10, 2025
Viaarxiv icon

VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents

Add code
Sep 04, 2025
Viaarxiv icon

Human Motion Video Generation: A Survey

Add code
Sep 04, 2025
Viaarxiv icon

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding

Add code
Aug 07, 2025
Viaarxiv icon

Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation

Add code
Aug 07, 2025
Viaarxiv icon

A Multi-Stage Framework for Multimodal Controllable Speech Synthesis

Add code
Jun 26, 2025
Viaarxiv icon