Picture for Bin Xia

Bin Xia

Divide and Conquer: Decoupled Representation Alignment for Multimodal World Models

Add code
May 03, 2026
Viaarxiv icon

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Add code
Apr 24, 2026
Viaarxiv icon

Index-ASR Technical Report

Add code
Dec 31, 2025
Viaarxiv icon

DreamOmni3: Scribble-based Editing and Generation

Add code
Dec 27, 2025
Viaarxiv icon

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Add code
Dec 09, 2025
Viaarxiv icon

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Add code
Dec 08, 2025
Viaarxiv icon

DreamVE: Unified Instruction-based Image and Video Editing

Add code
Aug 08, 2025
Viaarxiv icon

Training-Free Efficient Video Generation via Dynamic Token Carving

Add code
May 22, 2025
Figure 1 for Training-Free Efficient Video Generation via Dynamic Token Carving
Figure 2 for Training-Free Efficient Video Generation via Dynamic Token Carving
Figure 3 for Training-Free Efficient Video Generation via Dynamic Token Carving
Figure 4 for Training-Free Efficient Video Generation via Dynamic Token Carving
Viaarxiv icon

GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning

Add code
Mar 04, 2025
Viaarxiv icon

DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration

Add code
Jan 17, 2025
Viaarxiv icon