Picture for Haoyuan Shi

Haoyuan Shi

Current World Models Lack a Persistent State Core

Add code
Jun 18, 2026
Viaarxiv icon

VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing

Add code
May 28, 2026
Viaarxiv icon

Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

Add code
May 14, 2026
Viaarxiv icon

MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation

Add code
Feb 27, 2026
Viaarxiv icon

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

Add code
Nov 16, 2025
Viaarxiv icon

Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation

Add code
Sep 26, 2025
Viaarxiv icon

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Add code
Jun 12, 2025
Viaarxiv icon

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Add code
May 08, 2025
Figure 1 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Figure 2 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Figure 3 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Figure 4 for Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Viaarxiv icon

AI Awareness

Add code
Apr 25, 2025
Viaarxiv icon

VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

Add code
Apr 23, 2025
Viaarxiv icon