Picture for Zuxuan Wu

Zuxuan Wu

Fudan University

HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models

Add code
Apr 14, 2026
Viaarxiv icon

CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation

Add code
Apr 10, 2026
Viaarxiv icon

Steering the Verifiability of Multimodal AI Hallucinations

Add code
Apr 08, 2026
Viaarxiv icon

HAD: Combining Hierarchical Diffusion with Metric-Decoupled RL for End-to-End Driving

Add code
Apr 04, 2026
Viaarxiv icon

FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

Add code
Mar 12, 2026
Viaarxiv icon

WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing

Add code
Mar 12, 2026
Viaarxiv icon

FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding

Add code
Mar 02, 2026
Viaarxiv icon

Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference

Add code
Mar 02, 2026
Viaarxiv icon

Learning Accurate Segmentation Purely from Self-Supervision

Add code
Feb 27, 2026
Viaarxiv icon

UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling

Add code
Feb 25, 2026
Viaarxiv icon