Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zichen Yan

AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning

Jan 22, 2026

Zichen Yan, Yuchen Hou, Shenao Wang, Yichao Gao, Rui Huang, Lin Zhao

Abstract:Object-Goal Navigation (ObjectNav) requires an agent to autonomously explore an unknown environment and navigate toward target objects specified by a semantic label. While prior work has primarily studied zero-shot ObjectNav under 2D locomotion, extending it to aerial platforms with 3D locomotion capability remains underexplored. Aerial robots offer superior maneuverability and search efficiency, but they also introduce new challenges in spatial perception, dynamic control, and safety assurance. In this paper, we propose AION for vision-based aerial ObjectNav without relying on external localization or global maps. AION is an end-to-end dual-policy reinforcement learning (RL) framework that decouples exploration and goal-reaching behaviors into two specialized policies. We evaluate AION on the AI2-THOR benchmark and further assess its real-time performance in IsaacSim using high-fidelity drone models. Experimental results show that AION achieves superior performance across comprehensive evaluation metrics in exploration, navigation efficiency, and safety. The video can be found at https://youtu.be/TgsUm6bb7zg.

Via

Access Paper or Ask Questions

SIGN: Safety-Aware Image-Goal Navigation for Autonomous Drones via Reinforcement Learning

Aug 17, 2025

Zichen Yan, Rui Huang, Lei He, Shao Guo, Lin Zhao

Abstract:Image-goal navigation (ImageNav) tasks a robot with autonomously exploring an unknown environment and reaching a location that visually matches a given target image. While prior works primarily study ImageNav for ground robots, enabling this capability for autonomous drones is substantially more challenging due to their need for high-frequency feedback control and global localization for stable flight. In this paper, we propose a novel sim-to-real framework that leverages visual reinforcement learning (RL) to achieve ImageNav for drones. To enhance visual representation ability, our approach trains the vision backbone with auxiliary tasks, including image perturbations and future transition prediction, which results in more effective policy training. The proposed algorithm enables end-to-end ImageNav with direct velocity control, eliminating the need for external localization. Furthermore, we integrate a depth-based safety module for real-time obstacle avoidance, allowing the drone to safely navigate in cluttered environments. Unlike most existing drone navigation methods that focus solely on reference tracking or obstacle avoidance, our framework supports comprehensive navigation behaviors--autonomous exploration, obstacle avoidance, and image-goal seeking--without requiring explicit global mapping. Code and model checkpoints will be released upon acceptance.

Via

Access Paper or Ask Questions

FOSP: Fine-tuning Offline Safe Policy through World Models

Jul 06, 2024

Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang

Figure 1 for FOSP: Fine-tuning Offline Safe Policy through World Models

Figure 2 for FOSP: Fine-tuning Offline Safe Policy through World Models

Figure 3 for FOSP: Fine-tuning Offline Safe Policy through World Models

Figure 4 for FOSP: Fine-tuning Offline Safe Policy through World Models

Abstract:Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To address this, some offline RL methods have emerged as solutions, which learn from a static dataset in a safe way by avoiding interactions with the environment. In this paper, we aim to further enhance safety during the deployment stage for vision-based robotic tasks by fine-tuning an offline-trained policy. We incorporate in-sample optimization, model-based policy expansion, and reachability guidance to construct a safe offline-to-online framework. Moreover, our method proves to improve the generalization of offline policy in unseen safety-constrained scenarios. Finally, the efficiency of our method is validated on simulation benchmarks with five vision-only tasks and a real robot by solving some deployment problems using limited data.

* 21 pages

Via

Access Paper or Ask Questions

Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy

Mar 04, 2024

Chenyang Cao, Zichen Yan, Renhao Lu, Junbo Tan, Xueqian Wang

Abstract:Offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. While prior work has demonstrated various approaches for agents to learn near-optimal policies, these methods encounter limitations when dealing with diverse constraints in complex environments, such as safety constraints. Some of these approaches prioritize goal attainment without considering safety, while others excessively focus on safety at the expense of training efficiency. In this paper, we study the problem of constrained offline GCRL and propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals. To evaluate the method performance, we build a benchmark based on the robot-fetching environment with a randomly positioned obstacle and use expert or random policies to generate an offline dataset. We compare RbSL with three offline GCRL algorithms and one offline safe RL algorithm. As a result, our method outperforms the existing state-of-the-art methods to a large extent. Furthermore, we validate the practicality and effectiveness of RbSL by deploying it on a real Panda manipulator. Code is available at https://github.com/Sunlighted/RbSL.git.

* Accepted by ICRA24

Via

Access Paper or Ask Questions

Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Dec 14, 2022

Linrui Zhang, Zichen Yan, Li Shen, Shoujie Li, Xueqian Wang, Dacheng Tao

Figure 1 for Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Figure 2 for Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Figure 3 for Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Figure 4 for Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Abstract:Learning a risk-aware policy is essential but rather challenging in unstructured robotic tasks. Safe reinforcement learning methods open up new possibilities to tackle this problem. However, the conservative policy updates make it intractable to achieve sufficient exploration and desirable performance in complex, sample-expensive environments. In this paper, we propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. Concretely, the baseline agent is responsible for maximizing rewards under standard RL settings. Thus, it is compatible with off-the-shelf training techniques of unconstrained optimization, exploration and exploitation. On the other hand, the safe agent mimics the baseline agent for policy improvement and learns to fulfill safety constraints via off-policy RL tuning. In contrast to training from scratch, safe policy correction requires significantly fewer interactions to obtain a near-optimal policy. The dual policies can be optimized synchronously via a shared replay buffer, or leveraging the pre-trained model or the non-learning-based controller as a fixed baseline agent. Experimental results show that our approach can learn feasible skills without prior knowledge as well as deriving risk-averse counterparts from pre-trained unsafe policies. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks with respect to both safety constraint satisfaction and sample efficiency.

* The 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

Via

Access Paper or Ask Questions