Picture for Tai Wang

Tai Wang

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

Add code
Jan 05, 2026
Viaarxiv icon

VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

Add code
Dec 31, 2025
Viaarxiv icon

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Add code
Dec 23, 2025
Viaarxiv icon

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Add code
Dec 11, 2025
Viaarxiv icon

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

Add code
Dec 09, 2025
Viaarxiv icon

ChangingGrounding: 3D Visual Grounding in Changing Scenes

Add code
Oct 16, 2025
Viaarxiv icon

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Add code
Aug 07, 2025
Viaarxiv icon

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

Add code
Jul 23, 2025
Viaarxiv icon

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Add code
Jul 17, 2025
Figure 1 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 2 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 3 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 4 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Viaarxiv icon

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

Add code
Jul 10, 2025
Figure 1 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Figure 2 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Figure 3 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Figure 4 for OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Viaarxiv icon