Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shangda Li

Unsupervised Domain Adaptation for Visual Navigation

Nov 12, 2020

Shangda Li, Devendra Singh Chaplot, Yao-Hung Hubert Tsai, Yue Wu, Louis-Philippe Morency, Ruslan Salakhutdinov

Figure 1 for Unsupervised Domain Adaptation for Visual Navigation

Figure 2 for Unsupervised Domain Adaptation for Visual Navigation

Figure 3 for Unsupervised Domain Adaptation for Visual Navigation

Figure 4 for Unsupervised Domain Adaptation for Visual Navigation

Abstract:Advances in visual navigation methods have led to intelligent embodied navigation agents capable of learning meaningful representations from raw RGB images and perform a wide variety of tasks involving structural and semantic reasoning. However, most learning-based navigation policies are trained and tested in simulation environments. In order for these policies to be practically useful, they need to be transferred to the real-world. In this paper, we propose an unsupervised domain adaptation method for visual navigation. Our method translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy. The proposed method outperforms several baselines across two different navigation tasks in simulation. We further show that our method can be used to transfer the navigation policies learned in simulation to the real world.

* Deep Reinforcement Learning Workshop at NeurIPS 2020. Camera Ready Version

Via

Access Paper or Ask Questions

Distributional Advantage Actor-Critic

Jun 10, 2018

Shangda Li, Selina Bing, Steven Yang

Figure 1 for Distributional Advantage Actor-Critic

Figure 2 for Distributional Advantage Actor-Critic

Figure 3 for Distributional Advantage Actor-Critic

Figure 4 for Distributional Advantage Actor-Critic

Abstract:In traditional reinforcement learning, an agent maximizes the reward collected during its interaction with the environment by approximating the optimal policy through the estimation of value functions. Typically, given a state s and action a, the corresponding value is the expected discounted sum of rewards. The optimal action is then chosen to be the action a with the largest value estimated by value function. However, recent developments have shown both theoretical and experimental evidence of superior performance when value function is replaced with value distribution in context of deep Q learning [1]. In this paper, we develop a new algorithm that combines advantage actor-critic with value distribution estimated by quantile regression. We evaluated this new algorithm, termed Distributional Advantage Actor-Critic (DA2C or QR-A2C) on a variety of tasks, and observed it to achieve at least as good as baseline algorithms, and outperforming baseline in some tasks with smaller variance and increased stability.

Via

Access Paper or Ask Questions