Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Object Permanence Through Audio-Visual Representations

Oct 20, 2020

Fanjun Bu, Chien-Ming Huang

Figure 1 for Object Permanence Through Audio-Visual Representations

Figure 2 for Object Permanence Through Audio-Visual Representations

Figure 3 for Object Permanence Through Audio-Visual Representations

Figure 4 for Object Permanence Through Audio-Visual Representations

Share this with someone who'll enjoy it:

Abstract:As robots perform manipulation tasks and interact with objects, it is probable that they accidentally drop objects that subsequently bounce out of their visual fields (e.g., due to an inadequate grasp of an unfamiliar object). To enable robots to recover from such errors, we draw upon the concept of object permanence---objects remain in existence even when they are not being sensed (e.g., seen) directly. In particular, we developed a multimodal neural network model---using a partial, observed bounce trajectory and the audio resulting from drop impact as its inputs---to predict the full bounce trajectory and the end location of a dropped object. We empirically show that: (1) our multimodal method predicted end locations close in proximity (i.e., within the visual field of the robot's wrist camera) to the actual locations and (2) the robot was able to retrieve dropped objects by applying minimal vision-based pick-up adjustments. Additionally, we show that our multimodal method outperformed the vision-only and audio-only baselines in retrieving dropped objects. Our results provide insights in enabling object permanence for robots and offer foundations for ensuring robust robot autonomy in task execution.

* 7 pages, 3 figures, 2 tables, submitted to IEEE Robotics and Automation Letters with ICRA 2021 Option

View paper on

Share this with someone who'll enjoy it:

Title:Object Permanence Through Audio-Visual Representations

Paper and Code