Picture for Ying Shan

Ying Shan

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Add code
Apr 23, 2026
Viaarxiv icon

OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video

Add code
Apr 13, 2026
Viaarxiv icon

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

Add code
Mar 31, 2026
Viaarxiv icon

Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels

Add code
Mar 05, 2026
Viaarxiv icon

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

Add code
Mar 04, 2026
Viaarxiv icon

MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

Add code
Feb 09, 2026
Viaarxiv icon

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Add code
Jan 08, 2026
Viaarxiv icon

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Add code
Dec 23, 2025
Viaarxiv icon

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Add code
Dec 16, 2025
Viaarxiv icon

MMhops-R1: Multimodal Multi-hop Reasoning

Add code
Dec 16, 2025
Figure 1 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 2 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 3 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 4 for MMhops-R1: Multimodal Multi-hop Reasoning
Viaarxiv icon