Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sehoon Ha

Georgia Tech

PrivilegedDreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies

Feb 17, 2025

Morgan Byrd, Jackson Crandell, Mili Das, Jessica Inman, Robert Wright, Sehoon Ha

Abstract:Numerous real-world control problems involve dynamics and objectives affected by unobservable hidden parameters, ranging from autonomous driving to robotic manipulation, which cause performance degradation during sim-to-real transfer. To represent these kinds of domains, we adopt hidden-parameter Markov decision processes (HIP-MDPs), which model sequential decision problems where hidden variables parameterize transition and reward functions. Existing approaches, such as domain randomization, domain adaptation, and meta-learning, simply treat the effect of hidden parameters as additional variance and often struggle to effectively handle HIP-MDP problems, especially when the rewards are parameterized by hidden variables. We introduce Privileged-Dreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter estimation module. PrivilegedDreamer features its novel dual recurrent architecture that explicitly estimates hidden parameters from limited historical data and enables us to condition the model, actor, and critic networks on these estimated parameters. Our empirical analysis on five diverse HIP-MDP tasks demonstrates that PrivilegedDreamer outperforms state-of-the-art model-based, model-free, and domain adaptation learning algorithms. Additionally, we conduct ablation studies to justify the inclusion of each component in the proposed architecture.

* Accepted to ICRA 2025. Website: https://morganbyrd03.github.io/icra25_privileged_dreamer/

Via

Access Paper or Ask Questions

RobotMover: Learning to Move Large Objects by Imitating the Dynamic Chain

Feb 07, 2025

Tianyu Li, Joanne Truong, Jimmy Yang, Alexander Clegg, Akshara Rai, Sehoon Ha, Xavier Puig

Abstract:Moving large objects, such as furniture, is a critical capability for robots operating in human environments. This task presents significant challenges due to two key factors: the need to synchronize whole-body movements to prevent collisions between the robot and the object, and the under-actuated dynamics arising from the substantial size and weight of the objects. These challenges also complicate performing these tasks via teleoperation. In this work, we introduce \method, a generalizable learning framework that leverages human-object interaction demonstrations to enable robots to perform large object manipulation tasks. Central to our approach is the Dynamic Chain, a novel representation that abstracts human-object interactions so that they can be retargeted to robotic morphologies. The Dynamic Chain is a spatial descriptor connecting the human and object root position via a chain of nodes, which encode the position and velocity of different interaction keypoints. We train policies in simulation using Dynamic-Chain-based imitation rewards and domain randomization, enabling zero-shot transfer to real-world settings without fine-tuning. Our approach outperforms both learning-based methods and teleoperation baselines across six evaluation metrics when tested on three distinct object types, both in simulation and on physical hardware. Furthermore, we successfully apply the learned policies to real-world tasks, such as moving a trash cart and rearranging chairs.

Via

Access Paper or Ask Questions

Understanding Expectations for a Robotic Guide Dog for Visually Impaired People

Jan 08, 2025

J. Taery Kim, Morgan Byrd, Jack L. Crandell, Bruce N. Walker, Greg Turk, Sehoon Ha

Figure 1 for Understanding Expectations for a Robotic Guide Dog for Visually Impaired People

Figure 2 for Understanding Expectations for a Robotic Guide Dog for Visually Impaired People

Figure 3 for Understanding Expectations for a Robotic Guide Dog for Visually Impaired People

Figure 4 for Understanding Expectations for a Robotic Guide Dog for Visually Impaired People

Abstract:Robotic guide dogs hold significant potential to enhance the autonomy and mobility of blind or visually impaired (BVI) individuals by offering universal assistance over unstructured terrains at affordable costs. However, the design of robotic guide dogs remains underexplored, particularly in systematic aspects such as gait controllers, navigation behaviors, interaction methods, and verbal explanations. Our study addresses this gap by conducting user studies with 18 BVI participants, comprising 15 cane users and three guide dog users. Participants interacted with a quadrupedal robot and provided both quantitative and qualitative feedback. Our study revealed several design implications, such as a preference for a learning-based controller and a rigid handle, gradual turns with asymmetric speeds, semantic communication methods, and explainability. The study also highlighted the importance of customization to support users with diverse backgrounds and preferences, along with practical concerns such as battery life, maintenance, and weather issues. These findings offer valuable insights and design implications for future research and development of robotic guide dogs.

* 12 pages, 4 figures, Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction (HRI'25)

Via

Access Paper or Ask Questions

Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Sep 30, 2024

Fukang Liu, Zhaoyuan Gu, Yilin Cai, Ziyi Zhou, Shijie Zhao, Hyunyoung Jung, Sehoon Ha, Yue Chen, Danfei Xu, Ye Zhao

Figure 1 for Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Figure 2 for Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Figure 3 for Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Figure 4 for Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Abstract:Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer precise and systematic control but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) provides robustness and handles high-dimensional spaces but suffers from inefficient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. We generate reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and train RL policies to track these trajectories. Our results demonstrate that Opt2Skill outperforms pure RL methods in both training efficiency and task performance, with optimal trajectories that account for torque limits enhancing trajectory tracking. We successfully transfer our approach to real-world applications.

Via

Access Paper or Ask Questions

HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation

Sep 22, 2024

Naoki Yokoyama, Ram Ramrakhya, Abhishek Das, Dhruv Batra, Sehoon Ha

Figure 1 for HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation

Figure 2 for HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation

Figure 3 for HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation

Figure 4 for HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation

Abstract:We present the Habitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON), a large-scale benchmark that broadens the scope and semantic range of prior Object Goal Navigation (ObjectNav) benchmarks. Leveraging the HM3DSem dataset, HM3D-OVON incorporates over 15k annotated instances of household objects across 379 distinct categories, derived from photo-realistic 3D scans of real-world environments. In contrast to earlier ObjectNav datasets, which limit goal objects to a predefined set of 6-20 categories, HM3D-OVON facilitates the training and evaluation of models with an open-set of goals defined through free-form language at test-time. Through this open-vocabulary formulation, HM3D-OVON encourages progress towards learning visuo-semantic navigation behaviors that are capable of searching for any object specified by text in an open-vocabulary manner. Additionally, we systematically evaluate and compare several different types of approaches on HM3D-OVON. We find that HM3D-OVON can be used to train an open-vocabulary ObjectNav agent that achieves both higher performance and is more robust to localization and actuation noise than the state-of-the-art ObjectNav approach. We hope that our benchmark and baseline results will drive interest in developing embodied agents that can navigate real-world spaces to find household objects specified through free-form language, taking a step towards more flexible and human-like semantic visual navigation. Code and videos available at: naoki.io/ovon.

Via

Access Paper or Ask Questions

Learning to enhance multi-legged robot on rugged landscapes

Sep 14, 2024

Juntao He, Baxi Chong, Zhaochen Xu, Sehoon Ha, Daniel I. Goldman

Figure 1 for Learning to enhance multi-legged robot on rugged landscapes

Figure 2 for Learning to enhance multi-legged robot on rugged landscapes

Figure 3 for Learning to enhance multi-legged robot on rugged landscapes

Figure 4 for Learning to enhance multi-legged robot on rugged landscapes

Abstract:Navigating rugged landscapes poses significant challenges for legged locomotion. Multi-legged robots (those with 6 and greater) offer a promising solution for such terrains, largely due to their inherent high static stability, resulting from a low center of mass and wide base of support. Such systems require minimal effort to maintain balance. Recent studies have shown that a linear controller, which modulates the vertical body undulation of a multi-legged robot in response to shifts in terrain roughness, can ensure reliable mobility on challenging terrains. However, the potential of a learning-based control framework that adjusts multiple parameters to address terrain heterogeneity remains underexplored. We posit that the development of an experimentally validated physics-based simulator for this robot can rapidly advance capabilities by allowing wide parameter space exploration. Here we develop a MuJoCo-based simulator tailored to this robotic platform and use the simulation to develop a reinforcement learning-based control framework that dynamically adjusts horizontal and vertical body undulation, and limb stepping in real-time. Our approach improves robot performance in simulation, laboratory experiments, and outdoor tests. Notably, our real-world experiments reveal that the learning-based controller achieves a 30\% to 50\% increase in speed compared to a linear controller, which only modulates vertical body waves. We hypothesize that the superior performance of the learning-based controller arises from its ability to adjust multiple parameters simultaneously, including limb stepping, horizontal body wave, and vertical body wave.

* Submitted to ICRA 2025

Via

Access Paper or Ask Questions

Language Guided Skill Discovery

Jun 07, 2024

Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

Figure 1 for Language Guided Skill Discovery

Figure 2 for Language Guided Skill Discovery

Figure 3 for Language Guided Skill Discovery

Figure 4 for Language Guided Skill Discovery

Abstract:Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity" of skills. We hypothesize that leveraging the semantic knowledge of large language models (LLMs) can lead us to improve semantic diversity of resulting behaviors. In this sense, we introduce Language Guided Skill Discovery (LGSD), a skill discovery framework that aims to directly maximize the semantic diversity between skills. LGSD takes user prompts as input and outputs a set of semantically distinctive skills. The prompts serve as a means to constrain the search space into a semantically desired subspace, and the generated LLM outputs guide the agent to visit semantically diverse states within the subspace. We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt. Furthermore, we show that language guidance aids in discovering more diverse skills compared to five existing skill discovery methods in robot-arm manipulation environments. Lastly, LGSD provides a simple way of utilizing learned skills via natural language.

Via

Access Paper or Ask Questions

Learning-based legged locomotion; state of the art and future perspectives

Jun 03, 2024

Sehoon Ha, Joonho Lee, Michiel van de Panne, Zhaoming Xie, Wenhao Yu, Majid Khadiv

Abstract:Legged locomotion holds the premise of universal mobility, a critical capability for many real-world robotic applications. Both model-based and learning-based approaches have advanced the field of legged locomotion in the past three decades. In recent years, however, a number of factors have dramatically accelerated progress in learning-based methods, including the rise of deep learning, rapid progress in simulating robotic systems, and the availability of high-performance and affordable hardware. This article aims to give a brief history of the field, to summarize recent efforts in learning locomotion skills for quadrupeds, and to provide researchers new to the area with an understanding of the key issues involved. With the recent proliferation of humanoid robots, we further outline the rapid rise of analogous methods for bipedal locomotion. We conclude with a discussion of open problems as well as related societal impact.

Via

Access Paper or Ask Questions

VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Dec 06, 2023

Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette Bucher

Figure 1 for VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Figure 2 for VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Figure 3 for VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Figure 4 for VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Abstract:Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM builds occupancy maps from depth observations to identify frontiers, and leverages RGB observations and a pre-trained vision-language model to generate a language-grounded value map. VLFM then uses this map to identify the most promising frontier to explore for finding an instance of a given target object category. We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator. Remarkably, VLFM achieves state-of-the-art results on all three datasets as measured by success weighted by path length (SPL) for the Object Goal Navigation task. Furthermore, we show that VLFM's zero-shot nature enables it to be readily deployed on real-world robots such as the Boston Dynamics Spot mobile manipulation platform. We deploy VLFM on Spot and demonstrate its capability to efficiently navigate to target objects within an office building in the real world, without any prior knowledge of the environment. The accomplishments of VLFM underscore the promising potential of vision-language models in advancing the field of semantic navigation. Videos of real-world deployment can be viewed at naoki.io/vlfm.

Via

Access Paper or Ask Questions

Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

Nov 04, 2023

Donghoon Youm, Hyunyoung Jung, Hyeongjun Kim, Jemin Hwangbo, Hae-Won Park, Sehoon Ha

Figure 1 for Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

Figure 2 for Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

Figure 3 for Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

Figure 4 for Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

Abstract:Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches. Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Programming and Raibert heuristic, which serves as an expert policy. Then we train a clone of the MPC using imitation learning to make the controller learnable. Finally, we leverage deep reinforcement learning with limited exploration for further finetuning the policy on more challenging terrains. By conducting comprehensive simulation and hardware experiments, we demonstrate that the proposed IFM framework can significantly improve the performance of the given MPC controller on rough, slippery, and conveyor terrains that require careful coordination of footsteps. We also showcase that IFM can efficiently produce more symmetric, periodic, and energy-efficient gaits compared to Vanilla RL with a minimal burden of reward shaping.

Via

Access Paper or Ask Questions