Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesco Rea

Exploring Temporal Representation in Neural Processes for Multimodal Action Prediction

Apr 09, 2026

Marco Gabriele Fedozzi, Yukie Nagai, Francesco Rea, Alessandra Sciutti

Abstract:Inspired by the human ability to understand and predict others, we study the applicability of Conditional Neural Processes (CNP) to the task of self-supervised multimodal action prediction in robotics. Following recent results regarding the ontogeny of the Mirror Neuron System (MNS), we focus on the preliminary objective of self-actions prediction. We find a good MNS-inspired model in the existing Deep Modality Blending Network (DMBN), able to reconstruct the visuo-motor sensory signal during a partially observed action sequence by leveraging the probabilistic generation of CNP. After a qualitative and quantitative evaluation, we highlight its difficulties in generalizing to unseen action sequences, and identify the cause in its inner representation of time. Therefore, we propose a revised version, termed DMBN-Positional Time Encoding (DMBN-PTE), that facilitates learning a more robust representation of temporal information, and provide preliminary results of its effectiveness in expanding the applicability of the architecture. DMBN-PTE figures as a first step in the development of robotic systems that autonomously learn to forecast actions on longer time scales refining their predictions with incoming observations.

* Submitted to the AIC 2023 (9th International Workshop on Artificial Intelligence and Cognition)

Via

Access Paper or Ask Questions

Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot

Jul 16, 2025

Luca Garello, Francesca Cocchella, Alessandra Sciutti, Manuel Catalano, Francesco Rea

Figure 1 for Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot

Figure 2 for Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot

Figure 3 for Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot

Figure 4 for Next-Gen Museum Guides: Autonomous Navigation and Visitor Interaction with an Agentic Robot

Abstract:Autonomous robots are increasingly being tested into public spaces to enhance user experiences, particularly in cultural and educational settings. This paper presents the design, implementation, and evaluation of the autonomous museum guide robot Alter-Ego equipped with advanced navigation and interactive capabilities. The robot leverages state-of-the-art Large Language Models (LLMs) to provide real-time, context aware question-and-answer (Q&A) interactions, allowing visitors to engage in conversations about exhibits. It also employs robust simultaneous localization and mapping (SLAM) techniques, enabling seamless navigation through museum spaces and route adaptation based on user requests. The system was tested in a real museum environment with 34 participants, combining qualitative analysis of visitor-robot conversations and quantitative analysis of pre and post interaction surveys. Results showed that the robot was generally well-received and contributed to an engaging museum experience, despite some limitations in comprehension and responsiveness. This study sheds light on HRI in cultural spaces, highlighting not only the potential of AI-driven robotics to support accessibility and knowledge acquisition, but also the current limitations and challenges of deploying such technologies in complex, real-world environments.

Via

Access Paper or Ask Questions

Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning

Apr 02, 2025

Luca Garello, Giulia Belgiovine, Gabriele Russo, Francesco Rea, Alessandra Sciutti

Figure 1 for Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning

Figure 2 for Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning

Figure 3 for Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning

Figure 4 for Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning

Abstract:Integrating robotics into everyday scenarios like tutoring or physical training requires robots capable of adaptive, socially engaging, and goal-oriented interactions. While Large Language Models show promise in human-like communication, their standalone use is hindered by memory constraints and contextual incoherence. This work presents a multimodal, cognitively inspired framework that enhances LLM-based autonomous decision-making in social and task-oriented Human-Robot Interaction. Specifically, we develop an LLM-based agent for a robot trainer, balancing social conversation with task guidance and goal-driven motivation. To further enhance autonomy and personalization, we introduce a memory system for selecting, storing and retrieving experiences, facilitating generalized reasoning based on knowledge built across different interactions. A preliminary HRI user study and offline experiments with a synthetic dataset validate our approach, demonstrating the system's ability to manage complex interactions, autonomously drive training tasks, and build and retrieve contextual memories, advancing socially intelligent robotics.

* Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

Via

Access Paper or Ask Questions

Toward a Universal Concept of Artificial Personality: Implementing Robotic Personality in a Kinova Arm

Jan 12, 2025

Alice Nardelli, Lorenzo Landolfi, Dario Pasquali, Antonio Sgorbissa, Francesco Rea, Carmine Recchiuto

Figure 1 for Toward a Universal Concept of Artificial Personality: Implementing Robotic Personality in a Kinova Arm

Figure 2 for Toward a Universal Concept of Artificial Personality: Implementing Robotic Personality in a Kinova Arm

Figure 3 for Toward a Universal Concept of Artificial Personality: Implementing Robotic Personality in a Kinova Arm

Figure 4 for Toward a Universal Concept of Artificial Personality: Implementing Robotic Personality in a Kinova Arm

Abstract:The fundamental role of personality in shaping interactions is increasingly being exploited in robotics. A carefully designed robotic personality has been shown to improve several key aspects of Human-Robot Interaction (HRI). However, the fragmentation and rigidity of existing approaches reveal even greater challenges when applied to non-humanoid robots. On one hand, the state of the art is very dispersed; on the other hand, Industry 4.0 is moving towards a future where humans and industrial robots are going to coexist. In this context, the proper design of a robotic personality can lead to more successful interactions. This research takes a first step in that direction by integrating a comprehensive cognitive architecture built upon the definition of robotic personality - validated on humanoid robots - into a robotic Kinova Jaco2 arm. The robot personality is defined through the cognitive architecture as a vector in the three-dimensional space encompassing Conscientiousness, Extroversion, and Agreeableness, affecting how actions are executed, the action selection process, and the internal reaction to environmental stimuli. Our main objective is to determine whether users perceive distinct personalities in the robot, regardless of its shape, and to understand the role language plays in shaping these perceptions. To achieve this, we conducted a user study comprising 144 sessions of a collaborative game between a Kinova Jaco2 arm and participants, where the robot's behavior was influenced by its assigned personality. Furthermore, we compared two conditions: in the first, the robot communicated solely through gestures and action choices, while in the second, it also utilized verbal interaction.

Via

Access Paper or Ask Questions

Let people fail! Exploring the influence of explainable virtual and robotic agents in learning-by-doing tasks

Nov 15, 2024

Marco Matarese, Francesco Rea, Katharina J. Rohlfing, Alessandra Sciutti

Figure 1 for Let people fail! Exploring the influence of explainable virtual and robotic agents in learning-by-doing tasks

Figure 2 for Let people fail! Exploring the influence of explainable virtual and robotic agents in learning-by-doing tasks

Figure 3 for Let people fail! Exploring the influence of explainable virtual and robotic agents in learning-by-doing tasks

Figure 4 for Let people fail! Exploring the influence of explainable virtual and robotic agents in learning-by-doing tasks

Abstract:Collaborative decision-making with artificial intelligence (AI) agents presents opportunities and challenges. While human-AI performance often surpasses that of individuals, the impact of such technology on human behavior remains insufficiently understood, primarily when AI agents can provide justifiable explanations for their suggestions. This study compares the effects of classic vs. partner-aware explanations on human behavior and performance during a learning-by-doing task. Three participant groups were involved: one interacting with a computer, another with a humanoid robot, and a third one without assistance. Results indicated that partner-aware explanations influenced participants differently based on the type of artificial agents involved. With the computer, participants enhanced their task completion times. At the same time, those interacting with the humanoid robot were more inclined to follow its suggestions, although they did not reduce their timing. Interestingly, participants autonomously performing the learning-by-doing task demonstrated superior knowledge acquisition than those assisted by explainable AI (XAI). These findings raise profound questions and have significant implications for automated tutoring and human-AI collaboration.

Via

Access Paper or Ask Questions

The Effects of Selected Object Features on a Pick-and-Place Task: a Human Multimodal Dataset

Jul 18, 2024

Linda Lastrico, Valerio Belcamino, Alessandro Carfì, Alessia Vignolo, Alessandra Sciutti, Fulvio Mastrogiovanni, Francesco Rea

Abstract:We propose a dataset to study the influence of object-specific characteristics on human pick-and-place movements and compare the quality of the motion kinematics extracted by various sensors. This dataset is also suitable for promoting a broader discussion on general learning problems in the hand-object interaction domain, such as intention recognition or motion generation with applications in the Robotics field. The dataset consists of the recordings of 15 subjects performing 80 repetitions of a pick-and-place action under various experimental conditions, for a total of 1200 pick-and-places. The data has been collected thanks to a multimodal setup composed of multiple cameras, observing the actions from different perspectives, a motion capture system, and a wrist-worn inertial measurement unit. All the objects manipulated in the experiments are identical in shape, size, and appearance but differ in weight and liquid filling, which influences the carefulness required for their handling.

* The International Journal of Robotics Research. 2024;43(1):98-109.
* Camera ready version. Full paper available in open-access at https://doi.org/10.1177/02783649231210965 Dataset available on Kaggle (DOI: 10.34740/KAGGLE/DS/2319925, https://www.kaggle.com/datasets/alessandrocarf/effects-of-object-characteristics-on-manipulations)

Via

Access Paper or Ask Questions

How much informative is your XAI? A decision-making assessment task to objectively measure the goodness of explanations

Dec 07, 2023

Marco Matarese, Francesco Rea, Alessandra Sciutti

Figure 1 for How much informative is your XAI? A decision-making assessment task to objectively measure the goodness of explanations

Figure 2 for How much informative is your XAI? A decision-making assessment task to objectively measure the goodness of explanations

Figure 3 for How much informative is your XAI? A decision-making assessment task to objectively measure the goodness of explanations

Abstract:There is an increasing consensus about the effectiveness of user-centred approaches in the explainable artificial intelligence (XAI) field. Indeed, the number and complexity of personalised and user-centred approaches to XAI have rapidly grown in recent years. Often, these works have a two-fold objective: (1) proposing novel XAI techniques able to consider the users and (2) assessing the \textit{goodness} of such techniques with respect to others. From these new works, it emerged that user-centred approaches to XAI positively affect the interaction between users and systems. However, so far, the goodness of XAI systems has been measured through indirect measures, such as performance. In this paper, we propose an assessment task to objectively and quantitatively measure the goodness of XAI systems in terms of their \textit{information power}, which we intended as the amount of information the system provides to the users during the interaction. Moreover, we plan to use our task to objectively compare two XAI techniques in a human-robot decision-making task to understand deeper whether user-centred approaches are more informative than classical ones.

Via

Access Paper or Ask Questions

Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot

Nov 09, 2023

Carlo Mazzola, Francesco Rea, Alessandra Sciutti

Figure 1 for Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot

Figure 2 for Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot

Figure 3 for Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot

Abstract:Addressee Estimation is the ability to understand to whom a person is talking, a skill essential for social robots to interact smoothly with humans. In this sense, it is one of the problems that must be tackled to develop effective conversational agents in multi-party and unstructured scenarios. As humans, one of the channels that mainly lead us to such estimation is the non-verbal behavior of speakers: first of all, their gaze and body pose. Inspired by human perceptual skills, in the present work, a deep-learning model for Addressee Estimation relying on these two non-verbal features is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction compared to previous tests on the dataset used for the training.

* 4 pages, 3 figures, paper presented at IRIM-3D 2023 Conference, Funded by the Horizon-Widera-2021 European Twinning project TERAIS: G.A. n. 101079338

Via

Access Paper or Ask Questions

Expressing and Inferring Action Carefulness in Human-to-Robot Handovers

Sep 30, 2023

Linda Lastrico, Nuno Ferreira Duarte, Alessandro Carfì, Francesco Rea, Alessandra Sciutti, Fulvio Mastrogiovanni, José Santos-Victor

Abstract:Implicit communication plays such a crucial role during social exchanges that it must be considered for a good experience in human-robot interaction. This work addresses implicit communication associated with the detection of physical properties, transport, and manipulation of objects. We propose an ecological approach to infer object characteristics from subtle modulations of the natural kinematics occurring during human object manipulation. Similarly, we take inspiration from human strategies to shape robot movements to be communicative of the object properties while pursuing the action goals. In a realistic HRI scenario, participants handed over cups - filled with water or empty - to a robotic manipulator that sorted them. We implemented an online classifier to differentiate careful/not careful human movements, associated with the cups' content. We compared our proposed "expressive" controller, which modulates the movements according to the cup filling, against a neutral motion controller. Results show that human kinematics is adjusted during the task, as a function of the cup content, even in reach-to-grasp motion. Moreover, the carefulness during the handover of full cups can be reliably inferred online, well before action completion. Finally, although questionnaires did not reveal explicit preferences from participants, the expressive robot condition improved task efficiency.

* PREPRINT VERSION Accepted for publication at the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2023)

Via

Access Paper or Ask Questions

To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

Aug 21, 2023

Carlo Mazzola, Marta Romeo, Francesco Rea, Alessandra Sciutti, Angelo Cangelosi

Figure 1 for To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

Figure 2 for To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

Figure 3 for To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

Figure 4 for To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

Abstract:Communicating shapes our social word. For a robot to be considered social and being consequently integrated in our social environment it is fundamental to understand some of the dynamics that rule human-human communication. In this work, we tackle the problem of Addressee Estimation, the ability to understand an utterance's addressee, by interpreting and exploiting non-verbal bodily cues from the speaker. We do so by implementing an hybrid deep learning model composed of convolutional layers and LSTM cells taking as input images portraying the face of the speaker and 2D vectors of the speaker's body posture. Our implementation choices were guided by the aim to develop a model that could be deployed on social robots and be efficient in ecological scenarios. We demonstrate that our model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a robot ego-centric point of view.

* 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1-10
* Accepted version of a paper published at 2023 International Joint Conference on Neural Networks (IJCNN). Please find the published version and info to cite the paper at https://doi.org/10.1109/IJCNN54540.2023.10191452 . 10 pages, 8 Figures, 3 Tables

Via

Access Paper or Ask Questions