Picture for Ying Shan

Ying Shan

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Add code
Jan 08, 2026
Viaarxiv icon

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Add code
Dec 23, 2025
Viaarxiv icon

MMhops-R1: Multimodal Multi-hop Reasoning

Add code
Dec 16, 2025
Figure 1 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 2 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 3 for MMhops-R1: Multimodal Multi-hop Reasoning
Figure 4 for MMhops-R1: Multimodal Multi-hop Reasoning
Viaarxiv icon

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Add code
Dec 16, 2025
Viaarxiv icon

ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

Add code
Nov 18, 2025
Figure 1 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Figure 2 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Figure 3 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Figure 4 for ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
Viaarxiv icon

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Add code
Aug 27, 2025
Figure 1 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 2 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 3 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 4 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Viaarxiv icon

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Add code
Aug 14, 2025
Figure 1 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Figure 2 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Figure 3 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Figure 4 for ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Viaarxiv icon

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

Add code
Jul 28, 2025
Viaarxiv icon

DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation

Add code
Jul 02, 2025
Figure 1 for DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
Figure 2 for DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
Figure 3 for DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
Figure 4 for DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
Viaarxiv icon

IC-Custom: Diverse Image Customization via In-Context Learning

Add code
Jul 02, 2025
Viaarxiv icon