Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xianghui Wang

Dynamic Execution Commitment of Vision-Language-Action Models

May 12, 2026

Feng Chen, Xianghui Wang, Yuxuan Chen, Boying Li, Yefei He, Zeyu Zhang, Yicheng Wu

Abstract:Vision-Language-Action (VLA) models predominantly adopt action chunking, i.e., predicting and committing to a short horizon of consecutive low-level actions in a single forward pass, to amortize the inference cost of large-scale backbones and reduce per-step latency. However, committing these multi-step predictions to real-world execution requires balancing success rate against inference efficiency, a decision typically governed by fixed execution horizons tuned per task. Such heuristics ignore the state-dependent nature of predictive reliability, leading to brittle performance in dynamic or out-of-distribution settings. In this paper, we introduce A3, an Adaptive Action Acceptance mechanism that reframes dynamic execution commitment as a self-speculative prefix verification problem. A3 first computes a trajectory-wise consensus score of actions via group sampling, then selects a representative draft and prioritizes downstream verification. Specifically, it enforces: (1) consensus-ordered conditional invariance, which validates low-consensus actions by judging whether they remain consistent when re-decoded conditioned on high-consensus actions; and (2) prefix-closed sequential consistency, which guarantees physical rollout integrity by accepting only the longest continuous sequence of verified actions starting from the beginning. Consequently, the execution horizon emerges as the longest verifiable prefix satisfying both internal model logic and sequential execution constraints. Experiments across diverse VLA models and benchmarks demonstrate that A3 eliminates the need for manual horizon tuning while achieving a superior trade-off between execution robustness and inference throughput.

* Code will be released in the next version

Via

Access Paper or Ask Questions

Do We Really Need Immediate Resets? Rethinking Collision Handling for Efficient Robot Navigation

May 04, 2026

Shanze Wang, Xinming Zhang, Siwei Cheng, Xianghui Wang, Hailong Huang, Wei Zhang

Abstract:Should a single collision necessarily terminate an entire navigation episode? In most deep reinforcement learning (DRL) frameworks for robot navigation, this remains the standard practice: every collision immediately triggers a global environment reset and is penalized as a complete task failure. While a collision during deployment naturally indicates task failure, applying the same treatment during training prevents the agent from exploring challenging obstacle configurations, which slows learning progress in the early training phase. In this work, we challenge this convention and propose a Multi-Collision reset Budget (MCB) framework that decouples local collision termination from global environment resets, allowing the agent to retry difficult configurations within the same episode. Experiments on multiple simulated and real-world robotic platforms show that the framework accelerates early-stage exploration and improves both success rate and navigation efficiency over conventional single-collision reset baselines, with a small collision budget producing the largest gains.

* 7 pages, 7 figures

Via

Access Paper or Ask Questions

Learning from Demonstration with Failure Awareness for Safe Robot Navigation

Apr 25, 2026

Xianghui Wang, Siwei Cheng, Shanze Wang, Xinming Zhang, Dan Zhang, Wei Zhang

Abstract:Learning from demonstration is widely used for robot navigation, yet it suffers from a fundamental limitation: demonstrations consist predominantly of successful behaviors and provide limited coverage of unsafe states. This limitation leads to poor safety when the robot encounters scenarios beyond the demonstration distribution. Failure experiences, such as collisions, contain essential information about unsafe regions, but remain underutilized. The key difficulty lies in the fact that failure data do not provide valid guidance for action imitation, and their naive incorporation into policy learning often degrades performance. We address this challenge by proposing a failure-aware learning framework that explicitly decouples the roles of success and failure data. In this framework, failure experiences are used to shape value estimation in hazardous regions, while policy learning is restricted to successful demonstrations. This separation enables the effective use of failure data without corrupting policy behavior. We implement this design within an offline reinforcement learning (RL) setting and evaluate it in both simulation and real-world environments. The results show that our framework consistently reduces collision rates while preserving the task success rate, and demonstrate strong generalization across different environments and robot platforms.

Via

Access Paper or Ask Questions

MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

May 13, 2025

Xianghui Wang, Xinming Zhang, Yanjun Chen, Xiaoyu Shen, Wei Zhang

Abstract:Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To address this limitation, we propose Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos (MA-ROESL). MA-ROESL integrates a motion-aware frame selection method to implicitly enhance the quality of VLM-generated reward functions. It further employs a hybrid three-phase training pipeline that improves training efficiency via rapid reward optimization and derives the final policy through online fine-tuning. Experimental results demonstrate that MA-ROESL significantly enhances training efficiency while faithfully reproducing locomotion skills in both simulated and real-world settings, thereby underscoring its potential as a robust and scalable framework for efficient robot locomotion skill learning from video demonstrations.

Via

Access Paper or Ask Questions

Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation

Mar 03, 2025

Shanze Wang, Mingao Tan, Zhibo Yang, Xianghui Wang, Xiaoyu Shen, Hailong Huang, Wei Zhang

Figure 1 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation

Figure 2 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation

Figure 3 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation

Figure 4 for Enhancing Deep Reinforcement Learning-based Robot Navigation Generalization through Scenario Augmentation

Abstract:This work focuses on enhancing the generalization performance of deep reinforcement learning-based robot navigation in unseen environments. We present a novel data augmentation approach called scenario augmentation, which enables robots to navigate effectively across diverse settings without altering the training scenario. The method operates by mapping the robot's observation into an imagined space, generating an imagined action based on this transformed observation, and then remapping this action back to the real action executed in simulation. Through scenario augmentation, we conduct extensive comparative experiments to investigate the underlying causes of suboptimal navigation behaviors in unseen environments. Our analysis indicates that limited training scenarios represent the primary factor behind these undesired behaviors. Experimental results confirm that scenario augmentation substantially enhances the generalization capabilities of deep reinforcement learning-based navigation systems. The improved navigation framework demonstrates exceptional performance by producing near-optimal trajectories with significantly reduced navigation time in real-world applications.

* 8 pages, 9 figures

Via

Access Paper or Ask Questions

Corrected Soft Actor Critic for Continuous Control

Oct 22, 2024

Yanjun Chen, Xinming Zhang, Xianghui Wang, Zhiqiang Xu, Xiaoyu Shen, Wei Zhang

Figure 1 for Corrected Soft Actor Critic for Continuous Control

Figure 2 for Corrected Soft Actor Critic for Continuous Control

Figure 3 for Corrected Soft Actor Critic for Continuous Control

Figure 4 for Corrected Soft Actor Critic for Continuous Control

Abstract:The Soft Actor-Critic (SAC) algorithm is known for its stability and high sample efficiency in deep reinforcement learning. However, the tanh transformation applied to sampled actions in SAC distorts the action distribution, hindering the selection of the most probable actions. This paper presents a novel action sampling method that directly identifies and selects the most probable actions within the transformed distribution, thereby addressing this issue. Extensive experiments on standard continuous control benchmarks demonstrate that the proposed method significantly enhances SAC's performance, resulting in faster convergence and higher cumulative rewards compared to the original algorithm.

Via

Access Paper or Ask Questions

Achieving Stable High-Speed Locomotion for Humanoid Robots with Deep Reinforcement Learning

Sep 25, 2024

Xinming Zhang, Xianghui Wang, Lerong Zhang, Guodong Guo, Xiaoyu Shen, Wei Zhang

Abstract:Humanoid robots offer significant versatility for performing a wide range of tasks, yet their basic ability to walk and run, especially at high velocities, remains a challenge. This letter presents a novel method that combines deep reinforcement learning with kinodynamic priors to achieve stable locomotion control (KSLC). KSLC promotes coordinated arm movements to counteract destabilizing forces, enhancing overall stability. Compared to the baseline method, KSLC provides more accurate tracking of commanded velocities and better generalization in velocity control. In simulation tests, the KSLC-enabled humanoid robot successfully tracked a target velocity of 3.5 m/s with reduced fluctuations. Sim-to-sim validation in a high-fidelity environment further confirmed its robust performance, highlighting its potential for real-world applications.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions