Picture for Ziwei Liu

Ziwei Liu

Nanyang Technological University

Scaling Spatial Intelligence with Multimodal Foundation Models

Add code
Nov 17, 2025
Figure 1 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 2 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 3 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 4 for Scaling Spatial Intelligence with Multimodal Foundation Models
Viaarxiv icon

OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer

Add code
Nov 14, 2025
Viaarxiv icon

Simulating the Visual World with Artificial Intelligence: A Roadmap

Add code
Nov 11, 2025
Viaarxiv icon

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Add code
Oct 30, 2025
Viaarxiv icon

SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

Add code
Oct 30, 2025
Viaarxiv icon

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction

Add code
Oct 26, 2025
Viaarxiv icon

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Add code
Oct 16, 2025
Viaarxiv icon

RealDPO: Real or Not Real, that is the Preference

Add code
Oct 16, 2025
Figure 1 for RealDPO: Real or Not Real, that is the Preference
Figure 2 for RealDPO: Real or Not Real, that is the Preference
Figure 3 for RealDPO: Real or Not Real, that is the Preference
Figure 4 for RealDPO: Real or Not Real, that is the Preference
Viaarxiv icon

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Add code
Oct 14, 2025
Figure 1 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 2 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 3 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 4 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Viaarxiv icon

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

Add code
Oct 06, 2025
Viaarxiv icon