Picture for Pengxiang Ding

Pengxiang Ding

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

Add code
Feb 26, 2026
Viaarxiv icon

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Add code
Dec 28, 2025
Viaarxiv icon

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Add code
Dec 10, 2025
Figure 1 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Figure 2 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Figure 3 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Figure 4 for HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Viaarxiv icon

Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling

Add code
Sep 16, 2025
Figure 1 for Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling
Figure 2 for Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling
Figure 3 for Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling
Figure 4 for Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling
Viaarxiv icon

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Add code
Sep 11, 2025
Viaarxiv icon

Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation

Add code
Aug 28, 2025
Viaarxiv icon

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver

Add code
Aug 14, 2025
Viaarxiv icon

CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

Add code
Jun 16, 2025
Figure 1 for CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Figure 2 for CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Figure 3 for CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Figure 4 for CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Viaarxiv icon

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Add code
May 18, 2025
Viaarxiv icon

Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions

Add code
May 16, 2025
Viaarxiv icon