Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matej Hoffmann

Embodied AI in Machine Learning -- is it Really Embodied?

May 15, 2025

Matej Hoffmann, Shubhan Parag Patni

Abstract:Embodied Artificial Intelligence (Embodied AI) is gaining momentum in the machine learning communities with the goal of leveraging current progress in AI (deep learning, transformers, large language and visual-language models) to empower robots. In this chapter we put this work in the context of "Good Old-Fashioned Artificial Intelligence" (GOFAI) (Haugeland, 1989) and the behavior-based or embodied alternatives (R. A. Brooks 1991; Pfeifer and Scheier 2001). We claim that the AI-powered robots are only weakly embodied and inherit some of the problems of GOFAI. Moreover, we review and critically discuss the possibility of cross-embodiment learning (Padalkar et al. 2024). We identify fundamental roadblocks and propose directions on how to make progress.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

A computational model of infant sensorimotor exploration in the mobile paradigm

Apr 24, 2025

Josua Spisak, Sergiu Tcaci Popescu, Stefan Wermter, Matej Hoffmann, J. Kevin O'Regan

Abstract:We present a computational model of the mechanisms that may determine infants' behavior in the "mobile paradigm". This paradigm has been used in developmental psychology to explore how infants learn the sensory effects of their actions. In this paradigm, a mobile (an articulated and movable object hanging above an infant's crib) is connected to one of the infant's limbs, prompting the infant to preferentially move that "connected" limb. This ability to detect a "sensorimotor contingency" is considered to be a foundational cognitive ability in development. To understand how infants learn sensorimotor contingencies, we built a model that attempts to replicate infant behavior. Our model incorporates a neural network, action-outcome prediction, exploration, motor noise, preferred activity level, and biologically-inspired motor control. We find that simulations with our model replicate the classic findings in the literature showing preferential movement of the connected limb. An interesting observation is that the model sometimes exhibits a burst of movement after the mobile is disconnected, casting light on a similar occasional finding in infants. In addition to these general findings, the simulations also replicate data from two recent more detailed studies using a connection with the mobile that was either gradual or all-or-none. A series of ablation studies further shows that the inclusion of mechanisms of action-outcome prediction, exploration, motor noise, and biologically-inspired motor control was essential for the model to correctly replicate infant behavior. This suggests that these components are also involved in infants' sensorimotor learning.

* 16 pages, 16 figures

Via

Access Paper or Ask Questions

Robot Skin with Touch and Bend Sensing using Electrical Impedance Tomography

Mar 17, 2025

Haofeng Chen, Bin Li, Bedrich Himmel, Xiaojie Wang, Matej Hoffmann

Abstract:Flexible electronic skins that simultaneously sense touch and bend are desired in several application areas, such as to cover articulated robot structures. This paper introduces a flexible tactile sensor based on Electrical Impedance Tomography (EIT), capable of simultaneously detecting and measuring contact forces and flexion of the sensor. The sensor integrates a magnetic hydrogel composite and utilizes EIT to reconstruct internal conductivity distributions. Real-time estimation is achieved through the one-step Gauss-Newton method, which dynamically updates reference voltages to accommodate sensor deformation. A convolutional neural network is employed to classify interactions, distinguishing between touch, bending, and idle states using pre-reconstructed images. Experimental results demonstrate an average touch localization error of 5.4 mm (SD 2.2 mm) and average bending angle estimation errors of 1.9$^\circ$ (SD 1.6$^\circ$). The proposed adaptive reference method effectively distinguishes between single- and multi-touch scenarios while compensating for deformation effects. This makes the sensor a promising solution for multimodal sensing in robotics and human-robot collaboration.

Via

Access Paper or Ask Questions

Large-area Tomographic Tactile Skin with Air Pressure Sensing for Improved Force Estimation

Mar 17, 2025

Haofeng Chen, Bedrich Himmel, Jiri Kubik, Matej Hoffmann, Hyosang Lee

Abstract:This paper presents a dual-channel tactile skin that integrates Electrical Impedance Tomography (EIT) with air pressure sensing to achieve accurate multi-contact force detection. The EIT layer provides spatial contact information, while the air pressure sensor delivers precise total force measurement. Our framework combines these complementary modalities through: deep learning-based EIT image reconstruction, contact area segmentation, and force allocation based on relative conductivity intensities from EIT. The experiments demonstrated 15.1% average force estimation error in single-contact scenarios and 20.1% in multi-contact scenarios without extensive calibration data requirements. This approach effectively addresses the challenge of simultaneously localizing and quantifying multiple contact forces without requiring complex external calibration setups, paving the way for practical and scalable soft robotic skin applications.

Via

Access Paper or Ask Questions

MuBlE: MuJoCo and Blender simulation Environment and Benchmark for Task Planning in Robot Manipulation

Mar 04, 2025

Michal Nazarczuk, Karla Stepanova, Jan Kristof Behrens, Matej Hoffmann, Krystian Mikolajczyk

Abstract:Current embodied reasoning agents struggle to plan for long-horizon tasks that require to physically interact with the world to obtain the necessary information (e.g. 'sort the objects from lightest to heaviest'). The improvement of the capabilities of such an agent is highly dependent on the availability of relevant training environments. In order to facilitate the development of such systems, we introduce a novel simulation environment (built on top of robosuite) that makes use of the MuJoCo physics engine and high-quality renderer Blender to provide realistic visual observations that are also accurate to the physical state of the scene. It is the first simulator focusing on long-horizon robot manipulation tasks preserving accurate physics modeling. MuBlE can generate mutlimodal data for training and enable design of closed-loop methods through environment interaction on two levels: visual - action loop, and control - physics loop. Together with the simulator, we propose SHOP-VRB2, a new benchmark composed of 10 classes of multi-step reasoning scenarios that require simultaneous visual and physical measurements.

* https://github.com/michaal94/MuBlE. arXiv admin note: substantial text overlap with arXiv:2404.15194

Via

Access Paper or Ask Questions

Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Feb 10, 2025

Giulia D Angelo, Victoria Clerico, Chiara Bartolozzi, Matej Hoffmann, P. Michael Furlong, Alexander Hadjiivanov

Figure 1 for Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Figure 2 for Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Figure 3 for Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Figure 4 for Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Abstract:Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system's 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.

Via

Access Paper or Ask Questions

Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications

Jan 13, 2025

Lukas Rustler, Vojtech Volprecht, Matej Hoffmann

Figure 1 for Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications

Figure 2 for Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications

Figure 3 for Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications

Figure 4 for Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications

Abstract:Depth sensing is an essential technology in robotics and many other fields. Many depth sensing (or RGB-D) cameras are available on the market and selecting the best one for your application can be challenging. In this work, we tested four stereoscopic RGB-D cameras that sense the distance by using two images from slightly different views. We empirically compared four cameras (Intel RealSense D435, Intel RealSense D455, StereoLabs ZED 2, and Luxonis OAK-D Pro) in three scenarios: (i) planar surface perception, (ii) plastic doll perception, (iii) household object perception (YCB dataset). We recorded and evaluated more than 3,000 RGB-D frames for each camera. For table-top robotics scenarios with distance to objects up to one meter, the best performance is provided by the D435 camera. For longer distances, the other three models perform better, making them more suitable for some mobile robotics applications. OAK-D Pro additionally offers integrated AI modules (e.g., object and human keypoint detection). ZED 2 is not a standalone device and requires a computer with a GPU for depth data acquisition. All data (more than 12,000 RGB-D frames) are publicly available at https://osf.io/f2seb.

Via

Access Paper or Ask Questions

Enhancing Robustness in Manipulability Assessment: The Pseudo-Ellipsoid Approach

Dec 25, 2024

Erfan Shahriari, Kim Kirstin Peper, Matej Hoffmann, Sami Haddadin

Abstract:Manipulability analysis is a methodology employed to assess the capacity of an articulated system, at a specific configuration, to produce motion or exert force in diverse directions. The conventional method entails generating a virtual ellipsoid using the system's configuration and model. Yet, this approach poses challenges when applied to systems such as the human body, where direct access to such information is limited, necessitating reliance on estimations. Any inaccuracies in these estimations can distort the ellipsoid's configuration, potentially compromising the accuracy of the manipulability assessment. To address this issue, this article extends the standard approach by introducing the concept of the manipulability pseudo-ellipsoid. Through a series of theoretical analyses, simulations, and experiments, the article demonstrates that the proposed method exhibits reduced sensitivity to noise in sensory information, consequently enhancing the robustness of the approach.

* 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
* 8 pages, 10 figures

Via

Access Paper or Ask Questions

Boosting Safe Human-Robot Collaboration Through Adaptive Collision Sensitivity

Sep 30, 2024

Lukas Rustler, Matej Misar, Matej Hoffmann

Abstract:What is considered safe for a robot operator during physical human-robot collaboration (HRC) is specified in corresponding HRC standards (e.g., the European ISO/TS 15066). The regime that allows collisions between the moving robot and the operator, called Power and Force Limiting (PFL), restricts the permissible contact forces. Using the same fixed contact thresholds on the entire robot surface results in significant and unnecessary productivity losses, as the robot needs to stop even when impact forces are within limits. Here we present a framework for setting the protective skin thresholds individually for different parts of the robot body and dynamically on the fly, based on the effective mass of each robot link and the link velocity. We perform experiments on a 6-axis collaborative robot arm (UR10e) completely covered with a sensitive skin (AIRSKIN) consisting of eleven individual pads. On a mock pick-and-place scenario with both transient and quasi-static collisions, we demonstrate how skin sensitivity influences the task performance and exerted force. We show an increase in productivity of almost 50% from the most conservative setting of collision thresholds to the most adaptive setting, while ensuring safety for human operators. The method is applicable to any robot for which the effective mass can be calculated.

* Submitted to ICRA 2025

Via

Access Paper or Ask Questions

Adaptive Electronic Skin Sensitivity for Safe Human-Robot Interaction

Sep 10, 2024

Lukas Rustler, Matej Misar, Matej Hoffmann

Abstract:Artificial electronic skins covering complete robot bodies can make physical human-robot collaboration safe and hence possible. Standards for collaborative robots (e.g., ISO/TS 15066) prescribe permissible forces and pressures during contacts with the human body. These characteristics of the collision depend on the speed of the colliding robot link but also on its effective mass. Thus, to warrant contacts complying with the Power and Force Limiting (PFL) collaborative regime but at the same time maximizing productivity, protective skin thresholds should be set individually for different parts of the robot bodies and dynamically on the run. Here we present and empirically evaluate four scenarios: (a) static and uniform - fixed thresholds for the whole skin, (b) static but different settings for robot body parts, (c) dynamically set based on every link velocity, (d) dynamically set based on effective mass of every robot link. We perform experiments in simulation and on a real 6-axis collaborative robot arm (UR10e) completely covered with sensitive skin (AIRSKIN) comprising eleven individual pads. On a mock pick-and-place scenario with transient collisions with the robot body parts and two collision reactions (stop and avoid), we demonstrate the boost in productivity in going from the most conservative setting of the skin thresholds (a) to the most adaptive setting (d). The threshold settings for every skin pad are adapted with a frequency of 25 Hz. This work can be easily extended for platforms with more degrees of freedom and larger skin coverage (humanoids) and to social human-robot interaction scenarios where contacts with the robot will be used for communication.

Via

Access Paper or Ask Questions