Alert button
Picture for Robert Lee

Robert Lee

Alert button

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence

Nov 23, 2023
Pengyuan Wang, Takuya Ikeda, Robert Lee, Koichi Nishiwaki

Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.Our approach projects 2D features from this foundation model into 3D for a single object model per category, and then performs matching against this for new single view observations of unseen object instances with a trained matching network. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance. We demonstrate this with a rich evaluation, showing improved performance over prior methods with a fraction of the data required.

Viaarxiv icon

Learning Fabric Manipulation in the Real World with Human Videos

Nov 12, 2022
Robert Lee, Jad Abou-Chakra, Fangyi Zhang, Peter Corke

Figure 1 for Learning Fabric Manipulation in the Real World with Human Videos
Figure 2 for Learning Fabric Manipulation in the Real World with Human Videos
Figure 3 for Learning Fabric Manipulation in the Real World with Human Videos
Figure 4 for Learning Fabric Manipulation in the Real World with Human Videos

Fabric manipulation is a long-standing challenge in robotics due to the enormous state space and complex dynamics. Learning approaches stand out as promising for this domain as they allow us to learn behaviours directly from data. Most prior methods however rely heavily on simulation, which is still limited by the large sim-to-real gap of deformable objects or rely on large datasets. A promising alternative is to learn fabric manipulation directly from watching humans perform the task. In this work, we explore how demonstrations for fabric manipulation tasks can be collected directly by humans, providing an extremely natural and fast data collection pipeline. Then, using only a handful of such demonstrations, we show how a pick-and-place policy can be learned and deployed on a real robot, without any robot data collection at all. We demonstrate our approach on a fabric folding task, showing that our policy can reliably reach folded states from crumpled initial configurations. Videos are available at: https://sites.google.com/view/foldingbyhand

Viaarxiv icon

Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience

Oct 07, 2020
Robert Lee, Daniel Ward, Akansel Cosgun, Vibhavari Dasagi, Peter Corke, Jurgen Leitner

Figure 1 for Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience
Figure 2 for Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience
Figure 3 for Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience
Figure 4 for Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience

Manipulating deformable objects, such as fabric, is a long standing problem in robotics, with state estimation and control posing a significant challenge for traditional methods. In this paper, we show that it is possible to learn fabric folding skills in only an hour of self-supervised real robot experience, without human supervision or simulation. Our approach relies on fully convolutional networks and the manipulation of visual inputs to exploit learned features, allowing us to create an expressive goal-conditioned pick and place policy that can be trained efficiently with real world robot data only. Folding skills are learned with only a sparse reward function and thus do not require reward function engineering, merely an image of the goal configuration. We demonstrate our method on a set of towel-folding tasks, and show that our approach is able to discover sequential folding strategies, purely from trial-and-error. We achieve state-of-the-art results without the need for demonstrations or simulation, used in prior approaches. Videos available at: https://sites.google.com/view/learningtofold

Viaarxiv icon

Model-free vision-based shaping of deformable plastic materials

Jan 30, 2020
Andrea Cherubini, Valerio Ortenzi, Akansel Cosgun, Robert Lee, Peter Corke

Figure 1 for Model-free vision-based shaping of deformable plastic materials
Figure 2 for Model-free vision-based shaping of deformable plastic materials
Figure 3 for Model-free vision-based shaping of deformable plastic materials
Figure 4 for Model-free vision-based shaping of deformable plastic materials

We address the problem of shaping deformable plastic materials using non-prehensile actions. Shaping plastic objects is challenging, since they are difficult to model and to track visually. We study this problem, by using kinetic sand, a plastic toy material which mimics the physical properties of wet sand. Inspired by a pilot study where humans shape kinetic sand, we define two types of actions: \textit{pushing} the material from the sides and \textit{tapping} from above. The chosen actions are executed with a robotic arm using image-based visual servoing. From the current and desired view of the material, we define states based on visual features such as the outer contour shape and the pixel luminosity values. These are mapped to actions, which are repeated iteratively to reduce the image error until convergence is reached. For pushing, we propose three methods for mapping the visual state to an action. These include heuristic methods and a neural network, trained from human actions. We show that it is possible to obtain simple shapes with the kinetic sand, without explicitly modeling the material. Our approach is limited in the types of shapes it can achieve. A richer set of action types and multi-step reasoning is needed to achieve more sophisticated shapes.

* Accepted to The International Journal of Robotics Research (IJRR) 
Viaarxiv icon

Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks

Nov 20, 2019
Vibhavari Dasagi, Robert Lee, Jake Bruce, Jürgen Leitner

Figure 1 for Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks
Figure 2 for Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks
Figure 3 for Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks
Figure 4 for Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks

Deep reinforcement learning has been shown to solve challenging tasks where large amounts of training experience is available, usually obtained online while learning the task. Robotics is a significant potential application domain for many of these algorithms, but generating robot experience in the real world is expensive, especially when each task requires a lengthy online training procedure. Off-policy algorithms can in principle learn arbitrary tasks from a diverse enough fixed dataset. In this work, we evaluate popular exploration methods by generating robotics datasets for the purpose of learning to solve tasks completely offline without any further interaction in the real world. We present results on three popular continuous control tasks in simulation, as well as continuous control of a high-dimensional real robot arm. Code documenting all algorithms, experiments, and hyper-parameters is available at https://github.com/qutrobotlearning/batchlearning.

Viaarxiv icon

Mirroring to Build Trust in Digital Assistants

Apr 02, 2019
Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff

Figure 1 for Mirroring to Build Trust in Digital Assistants
Figure 2 for Mirroring to Build Trust in Digital Assistants
Figure 3 for Mirroring to Build Trust in Digital Assistants

We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user. In particular, these experiments are designed to measure whether users prefer and trust an assistant whose conversational style matches their own. To this end we conducted a user study where subjects interacted with a digital assistant that responded in a way that either matched their conversational style, or did not. Using self-reported personality attributes and subjects' feedback on the interactions, we built models that can reliably predict a user's preferred conversational style.

* Preprint 
Viaarxiv icon

Zero-shot Sim-to-Real Transfer with Modular Priors

Sep 20, 2018
Robert Lee, Serena Mou, Vibhavari Dasagi, Jake Bruce, Jürgen Leitner, Niko Sünderhauf

Figure 1 for Zero-shot Sim-to-Real Transfer with Modular Priors
Figure 2 for Zero-shot Sim-to-Real Transfer with Modular Priors
Figure 3 for Zero-shot Sim-to-Real Transfer with Modular Priors
Figure 4 for Zero-shot Sim-to-Real Transfer with Modular Priors

Current end-to-end Reinforcement Learning (RL) approaches are severely limited by restrictively large search spaces and are prone to overfitting to their training environment. This is because in end-to-end RL perception, decision-making and low-level control are all being learned jointly from very sparse reward signals, with little capability of incorporating prior knowledge or existing algorithms. In this work, we propose a novel framework that effectively decouples RL for high-level decision making from low-level perception and control. This allows us to transfer a learned policy from a highly abstract simulation to a real robot without requiring any transfer learning. We therefore coin our approach zero-shot sim-to-real transfer. We successfully demonstrate our approach on the robot manipulation task of object sorting. A key component of our approach is a deep sets encoder that enables us to reinforcement learn the high-level policy based on the variable-length output of a pre-trained object detector, instead of learning from raw pixels. We show that this method can learn effective policies within mere minutes of highly simplified simulation. The learned policies can be directly deployed on a robot without further training, and generalize to variations of the task unseen during training.

* Submitted to ICRA 2019, under review 
Viaarxiv icon