Picture for Xiangtai Li

Xiangtai Li

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Add code
Oct 23, 2025
Viaarxiv icon

From Masks to Worlds: A Hitchhiker's Guide to World Models

Add code
Oct 23, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

One Flight Over the Gap: A Survey from Perspective to Panoramic Vision

Add code
Sep 04, 2025
Figure 1 for One Flight Over the Gap: A Survey from Perspective to Panoramic Vision
Figure 2 for One Flight Over the Gap: A Survey from Perspective to Panoramic Vision
Figure 3 for One Flight Over the Gap: A Survey from Perspective to Panoramic Vision
Figure 4 for One Flight Over the Gap: A Survey from Perspective to Panoramic Vision
Viaarxiv icon

PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification

Add code
Aug 29, 2025
Figure 1 for PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification
Figure 2 for PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification
Figure 3 for PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification
Figure 4 for PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification
Viaarxiv icon

Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning

Add code
Aug 14, 2025
Viaarxiv icon

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Add code
Jul 10, 2025
Viaarxiv icon

Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning

Add code
Jul 02, 2025
Viaarxiv icon

Dense360: Dense Understanding from Omnidirectional Panoramas

Add code
Jun 17, 2025
Viaarxiv icon