Dept. of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genoa, Genoa, Italy
Abstract:Fluent human--robot collaboration requires robots to continuously estimate human behaviour and anticipate future intentions. This entails reasoning jointly about \emph{continuous movements} and \emph{discrete actions}, which are still largely modelled in isolation. In this paper, we introduce \textsf{MA-HERP}, a hierarchical and recursive probabilistic framework for the \emph{joint estimation and prediction} of human movements and actions. The model combines: (i) a hierarchical representation in which movements compose into actions through admissible Allen interval relations, (ii) a unified probabilistic factorisation coupling continuous dynamics, discrete labels, and durations, and (iii) a recursive inference scheme inspired by Bayesian filtering, alternating top-down action prediction with bottom-up sensory evidence. We present a preliminary experimental evaluation based on neural models trained on musculoskeletal simulations of reaching movements, showing accurate motion prediction, robust action inference under noise, and computational performance compatible with on-line human--robot collaboration.
Abstract:Human activity recognition (HAR) is fundamental in human-robot collaboration (HRC), enabling robots to respond to and dynamically adapt to human intentions. This paper introduces a HAR system combining a modular data glove equipped with Inertial Measurement Units and a vision-based tactile sensor to capture hand activities in contact with a robot. We tested our activity recognition approach under different conditions, including offline classification of segmented sequences, real-time classification under static conditions, and a realistic HRC scenario. The experimental results show a high accuracy for all the tasks, suggesting that multiple collaborative settings could benefit from this multi-modal approach.
Abstract:Dialogue-based human-robot interaction requires robot cognitive assistants to maintain persistent user context, recover from underspecified requests, and ground responses in external evidence, while keeping intermediate decisions verifiable. In this paper we introduce JANUS, a cognitive architecture for assistive robots that models interaction as a partially observable Markov decision process and realizes control as a factored controller with typed interfaces. To this aim, Janus (i) decomposes the overall behavior into specialized modules, related to scope detection, intent recognition, memory, inner speech, query generation, and outer speech, and (ii) exposes explicit policies for information sufficiency, execution readiness, and tool grounding. A dedicated memory agent maintains a bounded recent-history buffer, a compact core memory, and an archival store with semantic retrieval, coupled through controlled consolidation and revision policies. Models inspired by the notion of inner speech in cognitive theories provide a control-oriented internal textual flow that validates parameter completeness and triggers clarification before grounding, while a faithfulness constraint ties robot-to-human claims to an evidence bundle combining working context and retrieved tool outputs. We evaluate JANUS through module-level unit tests in a dietary assistance domain grounded on a knowledge graph, reporting high agreement with curated references and practical latency profiles. These results support factored reasoning as a promising path to scalable, auditable, and evidence-grounded robot assistance over extended interaction horizons.
Abstract:We explore the use of inner speech as a mechanism to enhance transparency and trust in social robots for dietary advice. In humans, inner speech structures thought processes and decision-making; in robotics, it improves explainability by making reasoning explicit. This is crucial in healthcare scenarios, where trust in robotic assistants depends on both accurate recommendations and human-like dialogue, which make interactions more natural and engaging. Building on this, we developed a social robot that provides dietary advice, and we provided the architecture with inner speech capabilities to validate user input, refine reasoning, and generate clear justifications. The system integrates large language models for natural language understanding and a knowledge graph for structured dietary information. By making decisions more transparent, our approach strengthens trust and improves human-robot interaction in healthcare. We validated this by measuring the computational efficiency of our architecture and conducting a small user study, which assessed the reliability of inner speech in explaining the robot's behavior.
Abstract:Human activity recognition (HAR) is essential for effective Human-Robot Collaboration (HRC), enabling robots to interpret and respond to human actions. This study evaluates the ability of a vision-based tactile sensor to classify 15 activities, comparing its performance to an IMU-based data glove. Additionally, we propose a multi-modal framework combining tactile and motion data to leverage their complementary strengths. We examined three approaches: motion-based classification (MBC) using IMU data, tactile-based classification (TBC) with single or dual video streams, and multi-modal classification (MMC) integrating both. Offline validation on segmented datasets assessed each configuration's accuracy under controlled conditions, while online validation on continuous action sequences tested online performance. Results showed the multi-modal approach consistently outperformed single-modality methods, highlighting the potential of integrating tactile and motion sensing to enhance HAR systems for collaborative robotics.




Abstract:Effective fall risk assessment is critical for post-stroke patients. The present study proposes a novel, data-informed fall risk assessment method based on the instrumented Timed Up and Go (ITUG) test data, bringing in many mobility measures that traditional clinical scales fail to capture. IFRA, which stands for Instrumented Fall Risk Assessment, has been developed using a two-step process: first, features with the highest predictive power among those collected in a ITUG test have been identified using machine learning techniques; then, a strategy is proposed to stratify patients into low, medium, or high-risk strata. The dataset used in our analysis consists of 142 participants, out of which 93 were used for training (15 synthetically generated), 17 for validation and 32 to test the resulting IFRA scale (22 non-fallers and 10 fallers). Features considered in the IFRA scale include gait speed, vertical acceleration during sit-to-walk transition, and turning angular velocity, which align well with established literature on the risk of fall in neurological patients. In a comparison with traditional clinical scales such as the traditional Timed Up & Go and the Mini-BESTest, IFRA demonstrates competitive performance, being the only scale to correctly assign more than half of the fallers to the high-risk stratum (Fischer's Exact test p = 0.004). Despite the dataset's limited size, this is the first proof-of-concept study to pave the way for future evidence regarding the use of IFRA tool for continuous patient monitoring and fall prevention both in clinical stroke rehabilitation and at home post-discharge.
Abstract:We propose a dataset to study the influence of object-specific characteristics on human pick-and-place movements and compare the quality of the motion kinematics extracted by various sensors. This dataset is also suitable for promoting a broader discussion on general learning problems in the hand-object interaction domain, such as intention recognition or motion generation with applications in the Robotics field. The dataset consists of the recordings of 15 subjects performing 80 repetitions of a pick-and-place action under various experimental conditions, for a total of 1200 pick-and-places. The data has been collected thanks to a multimodal setup composed of multiple cameras, observing the actions from different perspectives, a motion capture system, and a wrist-worn inertial measurement unit. All the objects manipulated in the experiments are identical in shape, size, and appearance but differ in weight and liquid filling, which influences the carefulness required for their handling.




Abstract:This paper presents a comprehensive framework to enhance Human-Robot Collaboration (HRC) in real-world scenarios. It introduces a formalism to model articulated tasks, requiring cooperation between two agents, through a smaller set of primitives. Our implementation leverages Hierarchical Task Networks (HTN) planning and a modular multisensory perception pipeline, which includes vision, human activity recognition, and tactile sensing. To showcase the system's scalability, we present an experimental scenario where two humans alternate in collaborating with a Baxter robot to assemble four pieces of furniture with variable components. This integration highlights promising advancements in HRC, suggesting a scalable approach for complex, cooperative tasks across diverse applications.




Abstract:Dexterous in-hand manipulation is a unique and valuable human skill requiring sophisticated sensorimotor interaction with the environment while respecting stability constraints. Satisfying these constraints with generated motions is essential for a robotic platform to achieve reliable in-hand manipulation skills. Explicitly modelling these constraints can be challenging, but they can be implicitly modelled and learned through experience or human demonstrations. We propose a learning and control approach based on dictionaries of motion primitives generated from human demonstrations. To achieve this, we defined an optimization process that combines motion primitives to generate robot fingertip trajectories for moving an object from an initial to a desired final pose. Based on our experiments, our approach allows a robotic hand to handle objects like humans, adhering to stability constraints without requiring explicit formalization. In other words, the proposed motion primitive dictionaries learn and implicitly embed the constraints crucial to the in-hand manipulation task.
Abstract:Hands are a fundamental tool humans use to interact with the environment and objects. Through hand motions, we can obtain information about the shape and materials of the surfaces we touch, modify our surroundings by interacting with objects, manipulate objects and tools, or communicate with other people by leveraging the power of gestures. For these reasons, sensorized gloves, which can collect information about hand motions and interactions, have been of interest since the 1980s in various fields, such as Human-Machine Interaction (HMI) and the analysis and control of human motions. Over the last 40 years, research in this field explored different technological approaches and contributed to the popularity of wearable custom and commercial products targeting hand sensorization. Despite a positive research trend, these instruments are not widespread yet outside research environments and devices aimed at research are often ad hoc solutions with a low chance of being reused. This paper aims to provide a systematic literature review for custom gloves to analyze their main characteristics and critical issues, from the type and number of sensors to the limitations due to device encumbrance. The collection of this information lays the foundation for a standardization process necessary for future breakthroughs in this research field.