Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sichang Su

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

Jun 12, 2026

He Zhang, Lingzhu Xiang, Haitao Lin, Zeyu Huang, Minghui Wang, Dingyan Zhong, Yubo Dong, Yihao Wu, Yongming Rao, Dongsheng Zhang(+16 more)

Abstract:In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.

Via

Access Paper or Ask Questions

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

May 29, 2025

Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang

Figure 1 for ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Figure 2 for ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Figure 3 for ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Figure 4 for ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Abstract:We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/

* 30 pages, 13 figures, 10 tables

Via

Access Paper or Ask Questions

Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning

Aug 24, 2024

Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti

Figure 1 for Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning

Figure 2 for Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning

Figure 3 for Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning

Figure 4 for Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning

Abstract:In multi-agent reinforcement learning (MARL), achieving multi-task generalization to diverse agents and objectives presents significant challenges. Existing online MARL algorithms primarily focus on single-task performance, but their lack of multi-task generalization capabilities typically results in substantial computational waste and limited real-life applicability. Meanwhile, existing offline multi-task MARL approaches are heavily dependent on data quality, often resulting in poor performance on unseen tasks. In this paper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training for Enhanced Multi-Task Generalization, which integrates online and offline learning to ensure both multi-task generalization and training efficiency. Specifically, our framework extracts potential general skills from offline multi-task datasets. We then train policies to select the optimal skills under the centralized training and decentralized execution paradigm (CTDE). During this stage, we utilize a replay buffer that integrates both offline data and online interactions. We empirically demonstrate that our framework effectively extracts and refines general skills, yielding impressive generalization to unseen tasks. Comparative analyses on the StarCraft multi-agent challenge show that HyGen outperforms a wide range of existing solely online and offline methods.

Via

Access Paper or Ask Questions