Alert button
Picture for Roberto Calandra

Roberto Calandra

Alert button

General In-Hand Object Rotation with Vision and Touch

Sep 18, 2023
Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.

* CoRL 2023; Website: https://haozhi.io/rotateit/ 
Viaarxiv icon

Deep Reinforcement Learning for the Joint Control of Traffic Light Signaling and Vehicle Speed Advice

Sep 18, 2023
Johannes V. S. Busch, Robert Voelckner, Peter Sossalla, Christian L. Vielhaus, Roberto Calandra, Frank H. P. Fitzek

Traffic congestion in dense urban centers presents an economical and environmental burden. In recent years, the availability of vehicle-to-anything communication allows for the transmission of detailed vehicle states to the infrastructure that can be used for intelligent traffic light control. The other way around, the infrastructure can provide vehicles with advice on driving behavior, such as appropriate velocities, which can improve the efficacy of the traffic system. Several research works applied deep reinforcement learning to either traffic light control or vehicle speed advice. In this work, we propose a first attempt to jointly learn the control of both. We show this to improve the efficacy of traffic systems. In our experiments, the joint control approach reduces average vehicle trip delays, w.r.t. controlling only traffic lights, in eight out of eleven benchmark scenarios. Analyzing the qualitative behavior of the vehicle speed advice policy, we observe that this is achieved by smoothing out the velocity profile of vehicles nearby a traffic light. Learning joint control of traffic signaling and speed advice in the real world could help to reduce congestion and mitigate the economical and environmental repercussions of today's traffic systems.

* 6 pages, 2 figures, accepted for publication at IEEE ICMLA 2023 
Viaarxiv icon

In-Hand Object Rotation via Rapid Motor Adaptation

Oct 10, 2022
Haozhi Qi, Ashish Kumar, Roberto Calandra, Yi Ma, Jitendra Malik

Figure 1 for In-Hand Object Rotation via Rapid Motor Adaptation
Figure 2 for In-Hand Object Rotation via Rapid Motor Adaptation
Figure 3 for In-Hand Object Rotation via Rapid Motor Adaptation
Figure 4 for In-Hand Object Rotation via Rapid Motor Adaptation

Generalized in-hand manipulation has long been an unsolved challenge of robotics. As a small step towards this grand goal, we demonstrate how to design and learn a simple adaptive controller to achieve in-hand object rotation using only fingertips. The controller is trained entirely in simulation on only cylindrical objects, which then - without any fine-tuning - can be directly deployed to a real robot hand to rotate dozens of objects with diverse sizes, shapes, and weights over the z-axis. This is achieved via rapid online adaptation of the controller to the object properties using only proprioception history. Furthermore, natural and stable finger gaits automatically emerge from training the control policy via reinforcement learning. Code and more videos are available at https://haozhi.io/hora

* CoRL 2022. Code and Website: https://haozhi.io/hora 
Viaarxiv icon

Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces

Sep 26, 2022
Justin Kerr, Huang Huang, Albert Wilcox, Ryan Hoque, Jeffrey Ichnowski, Roberto Calandra, Ken Goldberg

Figure 1 for Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces
Figure 2 for Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces
Figure 3 for Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces
Figure 4 for Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces

Humans make extensive use of vision and touch as complementary senses, with vision providing global information about the scene and touch measuring local information during manipulation without suffering from occlusions. In this work, we propose a novel framework for learning multi-task visuo-tactile representations in a self-supervised manner. We design a mechanism which enables a robot to autonomously collect spatially aligned visual and tactile data, a key property for downstream tasks. We then train visual and tactile encoders to embed these paired sensory inputs into a shared latent space using cross-modal contrastive loss. The learned representations are evaluated without fine-tuning on 5 perception and control tasks involving deformable surfaces: tactile classification, contact localization, anomaly detection (e.g., surgical phantom tumor palpation), tactile search from a visual query (e.g., garment feature localization under occlusion), and tactile servoing along cloth edges and cables. The learned representations achieve an 80% success rate on towel feature classification, a 73% average success rate on anomaly detection in surgical materials, a 100% average success rate on vision-guided tactile search, and 87.8% average servo distance along cables and garment seams. These results suggest the flexibility of the learned representations and pose a step toward task-agnostic visuo-tactile representation learning for robot control.

Viaarxiv icon

Investigating Compounding Prediction Errors in Learned Dynamics Models

Mar 17, 2022
Nathan Lambert, Kristofer Pister, Roberto Calandra

Figure 1 for Investigating Compounding Prediction Errors in Learned Dynamics Models
Figure 2 for Investigating Compounding Prediction Errors in Learned Dynamics Models
Figure 3 for Investigating Compounding Prediction Errors in Learned Dynamics Models
Figure 4 for Investigating Compounding Prediction Errors in Learned Dynamics Models

Accurately predicting the consequences of agents' actions is a key prerequisite for planning in robotic control. Model-based reinforcement learning (MBRL) is one paradigm which relies on the iterative learning and prediction of state-action transitions to solve a task. Deep MBRL has become a popular candidate, using a neural network to learn a dynamics model that predicts with each pass from high-dimensional states to actions. These "one-step" predictions are known to become inaccurate over longer horizons of composed prediction - called the compounding error problem. Given the prevalence of the compounding error problem in MBRL and related fields of data-driven control, we set out to understand the properties of and conditions causing these long-horizon errors. In this paper, we explore the effects of subcomponents of a control problem on long term prediction error: including choosing a system, collecting data, and training a model. These detailed quantitative studies on simulated and real-world data show that the underlying dynamics of a system are the strongest factor determining the shape and magnitude of prediction error. Given a clearer understanding of compounding prediction error, researchers can implement new types of models beyond "one-step" that are more useful for control.

* 25 pages, 19 figures 
Viaarxiv icon

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Jan 11, 2022
Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer

Figure 1 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Figure 2 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Figure 3 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Figure 4 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems, while also limits its full potential. In many other areas of machine learning, AutoML has shown it is possible to automate such design choices and has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey we seek to unify the field of AutoRL, we provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.

Viaarxiv icon

What Robot do I Need? Fast Co-Adaptation of Morphology and Control using Graph Neural Networks

Nov 03, 2021
Kevin Sebastian Luck, Roberto Calandra, Michael Mistry

Figure 1 for What Robot do I Need? Fast Co-Adaptation of Morphology and Control using Graph Neural Networks
Figure 2 for What Robot do I Need? Fast Co-Adaptation of Morphology and Control using Graph Neural Networks
Figure 3 for What Robot do I Need? Fast Co-Adaptation of Morphology and Control using Graph Neural Networks
Figure 4 for What Robot do I Need? Fast Co-Adaptation of Morphology and Control using Graph Neural Networks

The co-adaptation of robot morphology and behaviour becomes increasingly important with the advent of fast 3D-manufacturing methods and efficient deep reinforcement learning algorithms. A major challenge for the application of co-adaptation methods to the real world is the simulation-to-reality-gap due to model and simulation inaccuracies. However, prior work focuses primarily on the study of evolutionary adaptation of morphologies exploiting analytical models and (differentiable) simulators with large population sizes, neglecting the existence of the simulation-to-reality-gap and the cost of manufacturing cycles in the real world. This paper presents a new approach combining classic high-frequency deep neural networks with computational expensive Graph Neural Networks for the data-efficient co-adaptation of agents with varying numbers of degrees-of-freedom. Evaluations in simulation show that the new method can co-adapt agents within such a limited number of production cycles by efficiently combining design optimization with offline reinforcement learning, that it allows for the direct application to real-world co-adaptation tasks in future work

Viaarxiv icon

Active 3D Shape Reconstruction from Vision and Touch

Jul 20, 2021
Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

Figure 1 for Active 3D Shape Reconstruction from Vision and Touch
Figure 2 for Active 3D Shape Reconstruction from Vision and Touch
Figure 3 for Active 3D Shape Reconstruction from Vision and Touch
Figure 4 for Active 3D Shape Reconstruction from Vision and Touch

Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. In active touch sensing for 3D reconstruction, the goal is to actively select the tactile readings that maximize the improvement in shape reconstruction accuracy. However, the development of deep learning-based active touch models is largely limited by the lack of frameworks for shape exploration. In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2) a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration. Our framework enables the development of the first fully data-driven solutions to active touch on top of learned models for object understanding. Our experiments show the benefits of such solutions in the task of 3D shape understanding where our models consistently outperform natural baselines. We provide our framework as a tool to foster future research in this direction.

Viaarxiv icon

Towards Learning to Play Piano with Dexterous Hands and Touch

Jun 08, 2021
Huazhe Xu, Yuping Luo, Shaoxiong Wang, Trevor Darrell, Roberto Calandra

Figure 1 for Towards Learning to Play Piano with Dexterous Hands and Touch
Figure 2 for Towards Learning to Play Piano with Dexterous Hands and Touch
Figure 3 for Towards Learning to Play Piano with Dexterous Hands and Touch
Figure 4 for Towards Learning to Play Piano with Dexterous Hands and Touch

The virtuoso plays the piano with passion, poetry and extraordinary technical ability. As Liszt said (a virtuoso)must call up scent and blossom, and breathe the breath of life. The strongest robots that can play a piano are based on a combination of specialized robot hands/piano and hardcoded planning algorithms. In contrast to that, in this paper, we demonstrate how an agent can learn directly from machine-readable music score to play the piano with dexterous hands on a simulated piano using reinforcement learning (RL) from scratch. We demonstrate the RL agents can not only find the correct key position but also deal with various rhythmic, volume and fingering, requirements. We achieve this by using a touch-augmented reward and a novel curriculum of tasks. We conclude by carefully studying the important aspects to enable such learning algorithms and that can potentially shed light on future research in this direction.

Viaarxiv icon