Picture for Runsen Xu

Runsen Xu

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Add code
Dec 11, 2025
Viaarxiv icon

ChangingGrounding: 3D Visual Grounding in Changing Scenes

Add code
Oct 16, 2025
Viaarxiv icon

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Add code
Aug 07, 2025
Viaarxiv icon

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

Add code
Jul 10, 2025
Figure 1 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Figure 2 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Figure 3 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Figure 4 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Viaarxiv icon

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Add code
May 29, 2025
Viaarxiv icon

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Add code
May 22, 2025
Figure 1 for Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
Figure 2 for Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
Figure 3 for Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
Figure 4 for Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
Viaarxiv icon

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

Add code
Oct 17, 2024
Figure 1 for VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Figure 2 for VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Figure 3 for VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Figure 4 for VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
Viaarxiv icon

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

Add code
Jun 13, 2024
Figure 1 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Figure 2 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Figure 3 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Figure 4 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Viaarxiv icon

Grounded 3D-LLM with Referent Tokens

Add code
May 16, 2024
Viaarxiv icon

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Add code
Dec 26, 2023
Viaarxiv icon