Picture for Zhiyong Wu

Zhiyong Wu

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

Add code
Mar 24, 2026
Viaarxiv icon

PROMO: Promptable Outfitting for Efficient High-Fidelity Virtual Try-On

Add code
Mar 12, 2026
Viaarxiv icon

Kling-MotionControl Technical Report

Add code
Mar 03, 2026
Viaarxiv icon

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction

Add code
Jan 06, 2026
Viaarxiv icon

From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing

Add code
Dec 31, 2025
Viaarxiv icon

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Add code
Nov 10, 2025
Figure 1 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 2 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 3 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 4 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Viaarxiv icon

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Add code
Sep 10, 2025
Viaarxiv icon

VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents

Add code
Sep 04, 2025
Viaarxiv icon

Human Motion Video Generation: A Survey

Add code
Sep 04, 2025
Viaarxiv icon

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon