This article introduces a framework for complex human-robot collaboration tasks, such as the co-manufacturing of furniture. For these tasks, it is essential to encode tasks from human demonstration and reproduce these skills in a compliant and safe manner. Therefore, two key components are addressed in this work: motion generation and shared autonomy. We propose a motion generator based on a time-invariant potential field, capable of encoding wrench profiles, complex and closed-loop trajectories, and additionally incorporates obstacle avoidance. Additionally, the paper addresses shared autonomy (SA) which enables synergetic collaboration between human operators and robots by dynamically allocating authority. Variable impedance control (VIC) and force control are employed, where impedance and wrench are adapted based on the human-robot autonomy factor derived from interaction forces. System passivity is ensured by an energy-tank based task passivation strategy. The framework's efficacy is validated through simulations and an experimental study employing a Franka Emika Research 3 robot.
Integrating robots into populated environments is a complex challenge that requires an understanding of human social dynamics. In this work, we propose to model social motion forecasting in a shared human-robot representation space, which facilitates us to synthesize robot motions that interact with humans in social scenarios despite not observing any robot in the motion training. We develop a transformer-based architecture called ECHO, which operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios. Contrary to prior works, we reformulate the social motion problem as the refinement of the predicted individual motions based on the surrounding agents, which facilitates the training while allowing for single-motion forecasting when only one human is in the scene. We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin while being efficient and performing in real-time. Additionally, our qualitative results showcase the effectiveness of our approach in generating human-robot interaction behaviors that can be controlled via text commands.
Robots are becoming increasingly integrated into our lives, assisting us in various tasks. To ensure effective collaboration between humans and robots, it is essential that they understand our intentions and anticipate our actions. In this paper, we propose a Human-Object Interaction (HOI) anticipation framework for collaborative robots. We propose an efficient and robust transformer-based model to detect and anticipate HOIs from videos. This enhanced anticipation empowers robots to proactively assist humans, resulting in more efficient and intuitive collaborations. Our model outperforms state-of-the-art results in HOI detection and anticipation in VidHOI dataset with an increase of 1.76% and 1.04% in mAP respectively while being 15.4 times faster. We showcase the effectiveness of our approach through experimental results in a real robot, demonstrating that the robot's ability to anticipate HOIs is key for better Human-Robot Interaction. More information can be found on our project webpage: https://evm7.github.io/HOI4ABOT_page/
Recently, several approaches have attempted to combine motion generation and control in one loop to equip robots with reactive behaviors, that cannot be achieved with traditional time-indexed tracking controllers. These approaches however mainly focused on positions, neglecting the orientation part which can be crucial to many tasks e.g. screwing. In this work, we propose a control algorithm that adapts the robot's rotational motion and impedance in a closed-loop manner. Given a first-order Dynamical System representing an orientation motion plan and a desired rotational stiffness profile, our approach enables the robot to follow the reference motion with an interactive behavior specified by the desired stiffness, while always being aware of the current orientation, represented as a Unit Quaternion (UQ). We rely on the Lie algebra to formulate our algorithm, since unlike positions, UQ feature constraints that should be respected in the devised controller. We validate our proposed approach in multiple robot experiments, showcasing the ability of our controller to follow complex orientation profiles, react safely to perturbations, and fulfill physical interaction tasks.
This paper introduces a novel approach for human-to-robot motion retargeting, enabling robots to mimic human motion with precision while preserving the semantics of the motion. For that, we propose a deep learning method for direct translation from human to robot motion. Our method does not require annotated paired human-to-robot motion data, which reduces the effort when adopting new robots. To this end, we first propose a cross-domain similarity metric to compare the poses from different domains (i.e., human and robot). Then, our method achieves the construction of a shared latent space via contrastive learning and decodes latent representations to robot motion control commands. The learned latent space exhibits expressiveness as it captures the motions precisely and allows direct motion control in the latent space. We showcase how to generate in-between motion through simple linear interpolation in the latent space between two projected human poses. Additionally, we conducted a comprehensive evaluation of robot control using diverse modality inputs, such as texts, RGB videos, and key-poses, which enhances the ease of robot control to users of all backgrounds. Finally, we compare our model with existing works and quantitatively and qualitatively demonstrate the effectiveness of our approach, enhancing natural human-robot communication and fostering trust in integrating robots into daily life.
The synthesis of human motion has traditionally been addressed through task-dependent models that focus on specific challenges, such as predicting future motions or filling in intermediate poses conditioned on known key-poses. In this paper, we present a novel task-independent model called UNIMASK-M, which can effectively address these challenges using a unified architecture. Our model obtains comparable or better performance than the state-of-the-art in each field. Inspired by Vision Transformers (ViTs), our UNIMASK-M model decomposes a human pose into body parts to leverage the spatio-temporal relationships existing in human motion. Moreover, we reformulate various pose-conditioned motion synthesis tasks as a reconstruction problem with different masking patterns given as input. By explicitly informing our model about the masked joints, our UNIMASK-M becomes more robust to occlusions. Experimental results show that our model successfully forecasts human motion on the Human3.6M dataset. Moreover, it achieves state-of-the-art results in motion inbetweening on the LaFAN1 dataset, particularly in long transition periods. More information can be found on the project website https://sites.google.com/view/estevevallsmascaro/publications/unimask-m.
In this paper, we present a novel learning-based shared control framework. This framework deploys first-order Dynamical Systems (DS) as motion generators providing the desired reference motion, and a Variable Stiffness Dynamical Systems (VSDS) \cite{chen2021closed} for haptic guidance. We show how to shape several features of our controller in order to achieve authority allocation, local motion refinement, in addition to the inherent ability of the controller to automatically synchronize with the human state during joint task execution. We validate our approach in a teleoperated task scenario, where we also showcase the ability of our framework to deal with situations that require updating task knowledge due to possible changes in the task scenario, or changes in the environment. Finally, we conduct a user study to compare the performance of our VSDS controller for guidance generation to two state-of-the-art controllers in a target reaching task. The result shows that our VSDS controller has the highest successful rate of task execution among all conditions. Besides, our VSDS controller helps reduce the execution time and task load significantly, and was selected as the most favorable controller by participants.
In this paper, we present a controller that combines motion generation and control in one loop, to endow robots with reactivity and safety. In particular, we propose a control approach that enables to follow the motion plan of a first order Dynamical System (DS) with a variable stiffness profile, in a closed loop configuration where the controller is always aware of the current robot state. This allows the robot to follow a desired path with an interactive behavior dictated by the desired stiffness. We also present two solutions to enable a robot to follow the desired velocity profile, in a manner similar to trajectory tracking controllers, while maintaining the closed-loop configuration. Additionally, we exploit the concept of energy tanks in order to guarantee the passivity during interactions with the environment, as well as the asymptotic stability in free motion, of our closed-loop system. The developed approach is evaluated extensively in simulation, as well as in real robot experiments, in terms of performance and safety both in free motion and during the execution of physical interaction tasks.
Understanding the human-object interactions (HOIs) from a video is essential to fully comprehend a visual scene. This line of research has been addressed by detecting HOIs from images and lately from videos. However, the video-based HOI anticipation task in the third-person view remains understudied. In this paper, we design a framework to detect current HOIs and anticipate future HOIs in videos. We propose to leverage human gaze information since people often fixate on an object before interacting with it. These gaze features together with the scene contexts and the visual appearances of human-object pairs are fused through a spatio-temporal transformer. To evaluate the model in the HOI anticipation task in a multi-person scenario, we propose a set of person-wise multi-label metrics. Our model is trained and validated on the VidHOI dataset, which contains videos capturing daily life and is currently the largest video HOI dataset. Experimental results in the HOI detection task show that our approach improves the baseline by a great margin of 36.3% relatively. Moreover, we conduct an extensive ablation study to demonstrate the effectiveness of our modifications and extensions to the spatio-temporal transformer. Our code is publicly available on https://github.com/nizhf/hoi-prediction-gaze-transformer.
We propose an extension of the input-output feedback linearization for a class of multivariate systems that are not input-output linearizable in a classical manner. The key observation is that the usual input-output linearization problem can be interpreted as the problem of solving simultaneous linear equations associated with the input gain matrix: thus, even at points where the input gain matrix becomes singular, it is still possible to solve a part of linear equations, by which a subset of input-output relations is made linear or close to be linear. Based on this observation, we adopt the task priority-based approach in the input-output linearization problem. First, we generalize the classical Byrnes-Isidori normal form to a prioritized normal form having a triangular structure, so that the singularity of a subblock of the input gain matrix related to lower-priority tasks does not directly propagate to higher-priority tasks. Next, we present a prioritized input-output linearization via the multi-objective optimization with the lexicographical ordering, resulting in a prioritized semilinear form that establishes input output relations whose subset with higher priority is linear or close to be linear. Finally, Lyapunov analysis on ultimate boundedness and task achievement is provided, particularly when the proposed prioritized input-output linearization is applied to the output tracking problem. This work introduces a new control framework for complex systems having critical and noncritical control issues, by assigning higher priority to the critical ones.