Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Kelly

The Importance of Adaptive Decision-Making for Autonomous Long-Range Planetary Surface Mobility

Sep 28, 2024

Olivier Lamarre, Jonathan Kelly

Abstract:Long-distance driving is an important component of planetary surface exploration. Unforeseen events often require human operators to adjust mobility plans, but this approach does not scale and will be insufficient for future missions. Interest in self-reliant rovers is increasing, however the research community has not yet given significant attention to autonomous, adaptive decision-making. In this paper, we look back at specific planetary mobility operations where human-guided adaptive planning played an important role in mission safety and productivity. Inspired by the abilities of human experts, we identify shortcomings of existing autonomous mobility algorithms for robots operating in off-road environments like planetary surfaces. We advocate for adaptive decision-making capabilities such as unassisted learning from past experiences and more reliance on stochastic world models. The aim of this work is to highlight promising research avenues to enhance ground planning tools and, ultimately, long-range autonomy algorithms on board planetary rovers.

* Accepted to the International Symposium on Artificial Intelligence, Robotics and Automation in Space (i-SAIRAS'24), Brisbane, Australia, Nov. 19-21, 2024

Via

Access Paper or Ask Questions

Making Space for Time: The Special Galilean Group and Its Application to Some Robotics Problems

Sep 22, 2024

Jonathan Kelly

Figure 1 for Making Space for Time: The Special Galilean Group and Its Application to Some Robotics Problems

Abstract:The special Galilean group, usually denoted SGal(3), is a 10-dimensional Lie group whose important subgroups include the special orthogonal group, the special Euclidean group, and the group of extended poses. We briefly describe SGal(3) and its Lie algebra and show how the group structure supports a unified representation of uncertainty in space and time. Our aim is to highlight the potential usefulness of this group for several robotics problems.

* 3 pages, 1 figure

Via

Access Paper or Ask Questions

FaVoR: Features via Voxel Rendering for Camera Relocalization

Sep 11, 2024

Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, Jonathan Kelly

Figure 1 for FaVoR: Features via Voxel Rendering for Camera Relocalization

Figure 2 for FaVoR: Features via Voxel Rendering for Camera Relocalization

Figure 3 for FaVoR: Features via Voxel Rendering for Camera Relocalization

Figure 4 for FaVoR: Features via Voxel Rendering for Camera Relocalization

Abstract:Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.

* Submitted to the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, Arizona, US, Feb 28-Mar 4, 2025

Via

Access Paper or Ask Questions

Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations

Jul 03, 2024

Trevor Ablett, Bryan Chan, Jayce Haoran Wang, Jonathan Kelly

Abstract:Learning from examples of success is an appealing approach to reinforcement learning that eliminates many of the disadvantages of using hand-crafted reward functions or full expert-demonstration trajectories, both of which can be difficult to acquire, biased, or suboptimal. However, learning from examples alone dramatically increases the exploration challenge, especially for complex tasks. This work introduces value-penalized auxiliary control from examples (VPACE); we significantly improve exploration in example-based control by adding scheduled auxiliary control and examples of auxiliary tasks. Furthermore, we identify a value-calibration problem, where policy value estimates can exceed their theoretical limits based on successful data. We resolve this problem, which is exacerbated by learning auxiliary tasks, through the addition of an above-success-level value penalty. Across three simulated and one real robotic manipulation environment, and 21 different main tasks, we show that our approach substantially improves learning efficiency. Videos, code, and datasets are available at https://papers.starslab.ca/vpace.

* Submitted to the Conference on Robot Learning (CoRL'24), Munich, Germany, Nov. 6-9, 2024

Via

Access Paper or Ask Questions

RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Apr 16, 2024

Ashkan Mirzaei, Riccardo De Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic

Figure 1 for RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Figure 2 for RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Figure 3 for RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Figure 4 for RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Abstract:Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plausibly replace the missing content. A good inpainting method should therefore not only enable high-quality synthesis but also a high degree of control. Based on this observation, we focus on enabling explicit control over the inpainted content and leverage a reference image as an efficient means to achieve this goal. Specifically, we introduce RefFusion, a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view. The personalization effectively adapts the prior distribution to the target scene, resulting in a lower variance of score distillation objective and hence significantly sharper details. Our framework achieves state-of-the-art results for object removal while maintaining high controllability. We further demonstrate the generality of our formulation on other downstream tasks such as object insertion, scene outpainting, and sparse view reconstruction.

* Project page: https://reffusion.github.io

Via

Access Paper or Ask Questions

Automated Continuous Force-Torque Sensor Bias Estimation

Mar 02, 2024

Philippe Nadeau, Miguel Rogel Garcia, Emmett Wise, Jonathan Kelly

Abstract:Six axis force-torque sensors are commonly attached to the wrist of serial robots to measure the external forces and torques acting on the robot's end-effector. These measurements are used for load identification, contact detection, and human-robot interaction amongst other applications. Typically, the measurements obtained from the force-torque sensor are more accurate than estimates computed from joint torque readings, as the former is independent of the robot's dynamic and kinematic models. However, the force-torque sensor measurements are affected by a bias that drifts over time, caused by the compounding effects of temperature changes, mechanical stresses, and other factors. In this work, we present a pipeline that continuously estimates the bias and the drift of the bias of a force-torque sensor attached to the wrist of a robot. The first component of the pipeline is a Kalman filter that estimates the kinematic state (position, velocity, and acceleration) of the robot's joints. The second component is a kinematic model that maps the joint-space kinematics to the task-space kinematics of the force-torque sensor. Finally, the third component is a Kalman filter that estimates the bias and the drift of the bias of the force-torque sensor assuming that the inertial parameters of the gripper attached to the distal end of the force-torque sensor are known with certainty.

* Technical Report STARS-2024-001, University of Toronto Institute for Aerospace Studies (7 pages, 0 figure)

Via

Access Paper or Ask Questions

PhotoBot: Reference-Guided Interactive Photography via Natural Language

Jan 19, 2024

Oliver Limoyo, Jimmy Li, Dmitriy Rivkin, Jonathan Kelly, Gregory Dudek

Figure 1 for PhotoBot: Reference-Guided Interactive Photography via Natural Language

Figure 2 for PhotoBot: Reference-Guided Interactive Photography via Natural Language

Figure 3 for PhotoBot: Reference-Guided Interactive Photography via Natural Language

Figure 4 for PhotoBot: Reference-Guided Interactive Photography via Natural Language

Abstract:We introduce PhotoBot, a framework for automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via a reference picture that is retrieved from a curated gallery. We exploit a visual language model (VLM) and an object detector to characterize reference pictures via textual descriptions and use a large language model (LLM) to retrieve relevant reference pictures based on a user's language query through text-based reasoning. To correspond the reference picture and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across significantly varying images. Using these features, we compute pose adjustments for an RGB-D camera by solving a Perspective-n-Point (PnP) problem. We demonstrate our approach on a real-world manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback.

Via

Access Paper or Ask Questions

Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover

Jan 16, 2024

Olivier Lamarre, Shantanu Malhotra, Jonathan Kelly

Figure 1 for Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover

Figure 2 for Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover

Figure 3 for Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover

Figure 4 for Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover

Abstract:Exploration of the lunar south pole with a solar-powered rover is challenging due to the highly dynamic solar illumination conditions and the presence of permanently shadowed regions (PSRs). In turn, careful planning in space and time is essential. Mission-level path planning is a global, spatiotemporal paradigm that addresses this challenge, taking into account rover resources and mission requirements. However, existing approaches do not proactively account for random disturbances, such as recurring faults, that may temporarily delay rover traverse progress. In this paper, we formulate a chance-constrained mission-level planning problem for the exploration of PSRs by a solar-powered rover affected by random faults. The objective is to find a policy that visits as many waypoints of scientific interest as possible while respecting an upper bound on the probability of mission failure. Our approach assumes that faults occur randomly, but at a known, constant average rate. Each fault is resolved within a fixed time, simulating the recovery period of an autonomous system or the time required for a team of human operators to intervene. Unlike solutions based upon dynamic programming alone, our method breaks the chance-constrained optimization problem into smaller offline and online subtasks to make the problem computationally tractable. Specifically, our solution combines existing mission-level path planning techniques with a stochastic reachability analysis component. We find mission plans that remain within reach of safety throughout large state spaces. To empirically validate our algorithm, we simulate mission scenarios using orbital terrain and illumination maps of Cabeus Crater. Results from simulations of multi-day, long-range drives in the LCROSS impact region are also presented.

* Accepted to the IEEE Aerospace Conference (AERO'24), Big Sky, Montana, March 2-9, 2024

Via

Access Paper or Ask Questions

Working Backwards: Learning to Place by Picking

Dec 04, 2023

Oliver Limoyo, Abhisek Konar, Trevor Ablett, Jonathan Kelly, Francois R. Hogan, Gregory Dudek

Figure 1 for Working Backwards: Learning to Place by Picking

Figure 2 for Working Backwards: Learning to Place by Picking

Figure 3 for Working Backwards: Learning to Place by Picking

Figure 4 for Working Backwards: Learning to Place by Picking

Abstract:We present Learning to Place by Picking (LPP), a method capable of autonomously collecting demonstrations for a family of placing tasks in which objects must be manipulated to specific locations. With LPP, we approach the learning of robotic object placement policies by reversing the grasping process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects that are initially located at their target placement locations. Our system is capable of collecting hundreds of demonstrations without human intervention by using a combination of tactile sensing and compliant control for grasps. We train a policy directly from visual observations through behaviour cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table and not at the original placement location). We validate our approach on home robotic scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of performance and data efficiency, while requiring no human supervision.

Via

Access Paper or Ask Questions

Push it to the Demonstrated Limit: Multimodal Visuotactile Imitation Learning with Force Matching

Nov 02, 2023

Trevor Ablett, Oliver Limoyo, Adam Sigal, Affan Jilani, Jonathan Kelly, Kaleem Siddiqi, Francois Hogan, Gregory Dudek

Figure 1 for Push it to the Demonstrated Limit: Multimodal Visuotactile Imitation Learning with Force Matching

Figure 2 for Push it to the Demonstrated Limit: Multimodal Visuotactile Imitation Learning with Force Matching

Figure 3 for Push it to the Demonstrated Limit: Multimodal Visuotactile Imitation Learning with Force Matching

Figure 4 for Push it to the Demonstrated Limit: Multimodal Visuotactile Imitation Learning with Force Matching

Abstract:Optical tactile sensors have emerged as an effective means to acquire dense contact information during robotic manipulation. A recently-introduced `see-through-your-skin' (STS) variant of this type of sensor has both visual and tactile modes, enabled by leveraging a semi-transparent surface and controllable lighting. In this work, we investigate the benefits of pairing visuotactile sensing with imitation learning for contact-rich manipulation tasks. First, we use tactile force measurements and a novel algorithm during kinesthetic teaching to yield a force profile that better matches that of the human demonstrator. Second, we add visual/tactile STS mode switching as a control policy output, simplifying the application of the sensor. Finally, we study multiple observation configurations to compare and contrast the value of visual/tactile data (both with and without mode switching) with visual data from a wrist-mounted eye-in-hand camera. We perform an extensive series of experiments on a real robotic manipulator with door-opening and closing tasks, including over 3,000 real test episodes. Our results highlight the importance of tactile sensing for imitation learning, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.

* 8 pages, 11 figures

Via

Access Paper or Ask Questions