Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuoyu Yue

ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training

Feb 13, 2026

Rushuai Yang, Hecheng Wang, Chiming Liu, Xiaohan Yan, Yunlong Wang, Xuan Du, Shuoyu Yue, Yongcheng Liu, Chuheng Zhang, Lizhe Qi(+3 more)

Abstract:We study how to improve large foundation vision-language-action (VLA) systems through online reinforcement learning (RL) in real-world settings. Central to this process is the value function, which provides learning signals to guide VLA learning from experience. In practice, the value function is estimated from trajectory fragments collected from different data sources, including historical policies and intermittent human interventions. Estimating the value function of current behavior quality from the mixture data is inherently an off-policy evaluation problem. However, prior work often adopts conservative on-policy estimation for stability, which avoids direct evaluation of the current high-capacity policy and limits learning effectiveness. In this paper, we propose ALOE, an action-level off-policy evaluation framework for VLA post-training. ALOE applies chunking-based temporal-difference bootstrapping to evaluate individual action sequences instead of predicting final task outcomes. This design improves effective credit assignment to critical action chunks under sparse rewards and supports stable policy improvement. We evaluate our method on three real-world manipulation tasks, including smartphone packing as a high-precision task, laundry folding as a long-horizon deformable-object task, and bimanual pick-and-place involving multi-object perception. Across all tasks, ALOE improves learning efficiency without compromising execution speed, showing that off-policy RL can be reintroduced in a reliable manner for real-world VLA post-training. Videos and additional materials are available at our project website.

Via

Access Paper or Ask Questions

Memory-Efficient 2D/3D Shape Assembly of Robot Swarms

Sep 30, 2025

Shuoyu Yue, Pengpeng Li, Yang Xu, Kunrui Ze, Xingjian Long, Huazi Cao, Guibin Sun

Figure 1 for Memory-Efficient 2D/3D Shape Assembly of Robot Swarms

Figure 2 for Memory-Efficient 2D/3D Shape Assembly of Robot Swarms

Figure 3 for Memory-Efficient 2D/3D Shape Assembly of Robot Swarms

Figure 4 for Memory-Efficient 2D/3D Shape Assembly of Robot Swarms

Abstract:Mean-shift-based approaches have recently emerged as the most effective methods for robot swarm shape assembly tasks. These methods rely on image-based representations of target shapes to compute local density gradients and perform mean-shift exploration, which constitute their core mechanism. However, such image representations incur substantial memory overhead, which can become prohibitive for high-resolution or 3D shapes. To overcome this limitation, we propose a memory-efficient tree map representation that hierarchically encodes user-specified shapes and is applicable to both 2D and 3D scenarios. Building on this representation, we design a behavior-based distributed controller that enables assignment-free shape assembly. Comparative 2D and 3D simulations against a state-of-the-art mean-shift algorithm demonstrate one to two orders of magnitude lower memory usage and two to three times faster shape entry while maintaining comparable uniformity. Finally, we validate the framework through physical experiments with 6 to 7 UAVs, confirming its real-world practicality.

Via

Access Paper or Ask Questions

Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements

Nov 08, 2024

Kunrui Ze, Wei Wang, Shuoyu Yue, Guibin Sun, Kexin Liu, Jinhu Lü

Figure 1 for Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements

Figure 2 for Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements

Figure 3 for Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements

Figure 4 for Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements

Abstract:This article studies the problem of distributed formation control for multiple robots by using onboard ultra wide band (UWB) ranging and inertial odometer (IO) measurements. Although this problem has been widely studied, a fundamental limitation of most works is that they require each robot's pose and sensor measurements are expressed in a common reference frame. However, it is inapplicable for nonholonomic robot formations due to the practical difficulty of aligning IO measurements of individual robot in a common frame. To address this problem, firstly, a concurrent-learning based estimator is firstly proposed to achieve relative localization between neighboring robots in a local frame. Different from most relative localization methods in a global frame, both relative position and orientation in a local frame are estimated with only UWB ranging and IO measurements. Secondly, to deal with information loss caused by directed communication topology, a cooperative localization algorithm is introduced to estimate the relative pose to the leader robot. Thirdly, based on the theoretical results on relative pose estimation, a distributed formation tracking controller is proposed for nonholonomic robots. Both gazebo physical simulation and real-world experiments conducted on networked TurtleBot3 nonholonomic robots are provided to demonstrate the effectiveness of the proposed method.

* 11 pages, 12 figures

Via

Access Paper or Ask Questions

Concurrent-Learning Based Relative Localization in Shape Formation of Robot Swarms

Oct 08, 2024

Jinhu Lü, Kunrui Ze, Shuoyu Yue, Kexin Liu, Wei Wang, Guibin Sun

Figure 1 for Concurrent-Learning Based Relative Localization in Shape Formation of Robot Swarms

Figure 2 for Concurrent-Learning Based Relative Localization in Shape Formation of Robot Swarms

Figure 3 for Concurrent-Learning Based Relative Localization in Shape Formation of Robot Swarms

Figure 4 for Concurrent-Learning Based Relative Localization in Shape Formation of Robot Swarms

Abstract:In this paper, we address the shape formation problem for massive robot swarms in environments where external localization systems are unavailable. Achieving this task effectively with solely onboard measurements is still scarcely explored and faces some practical challenges. To solve this challenging problem, we propose the following novel results. Firstly, to estimate the relative positions among neighboring robots, a concurrent-learning based estimator is proposed. It relaxes the persistent excitation condition required in the classical ones such as least-square estimator. Secondly, we introduce a finite-time agreement protocol to determine the shape location. This is achieved by estimating the relative position between each robot and a randomly assigned seed robot. The initial position of the seed one marks the shape location. Thirdly, based on the theoretical results of the relative localization, a novel behavior-based control strategy is devised. This strategy not only enables adaptive shape formation of large group of robots but also enhances the observability of inter-robot relative localization. Numerical simulation results are provided to verify the performance of our proposed strategy compared to the state-of-the-art ones. Additionally, outdoor experiments on real robots further demonstrate the practical effectiveness and robustness of our methods.

Via

Access Paper or Ask Questions