Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyun Chi

DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

Sep 05, 2025

Tien Pham, Xinyun Chi, Khang Nguyen, Manfred Huber, Angelo Cangelosi

Abstract:Reinforcement learning (RL) agents can learn to solve complex tasks from visual inputs, but generalizing these learned skills to new environments remains a major challenge in RL application, especially robotics. While data augmentation can improve generalization, it often compromises sample efficiency and training stability. This paper introduces DeGuV, an RL framework that enhances both generalization and sample efficiency. In specific, we leverage a learnable masker network that produces a mask from the depth input, preserving only critical visual information while discarding irrelevant pixels. Through this, we ensure that our RL agents focus on essential features, improving robustness under data augmentation. In addition, we incorporate contrastive learning and stabilize Q-value estimation under augmentation to further enhance sample efficiency and training stability. We evaluate our proposed method on the RL-ViGen benchmark using the Franka Emika robot and demonstrate its effectiveness in zero-shot sim-to-real transfer. Our results show that DeGuV outperforms state-of-the-art methods in both generalization and sample efficiency while also improving interpretability by highlighting the most relevant regions in the visual input

Via

Access Paper or Ask Questions

Joint Action Language Modelling for Transparent Policy Execution

Apr 14, 2025

Theodor Wulff, Rahul Singh Maharjan, Xinyun Chi, Angelo Cangelosi

Figure 1 for Joint Action Language Modelling for Transparent Policy Execution

Figure 2 for Joint Action Language Modelling for Transparent Policy Execution

Figure 3 for Joint Action Language Modelling for Transparent Policy Execution

Figure 4 for Joint Action Language Modelling for Transparent Policy Execution

Abstract:An agent's intention often remains hidden behind the black-box nature of embodied policies. Communication using natural language statements that describe the next action can provide transparency towards the agent's behavior. We aim to insert transparent behavior directly into the learning process, by transforming the problem of policy learning into a language generation problem and combining it with traditional autoregressive modelling. The resulting model produces transparent natural language statements followed by tokens representing the specific actions to solve long-horizon tasks in the Language-Table environment. Following previous work, the model is able to learn to produce a policy represented by special discretized tokens in an autoregressive manner. We place special emphasis on investigating the relationship between predicting actions and producing high-quality language for a transparent agent. We find that in many cases both the quality of the action trajectory and the transparent statement increase when they are generated simultaneously.

Via

Access Paper or Ask Questions