Tendon-based underactuated hands are intended to be simple, compliant and affordable. Often, they are 3D printed and do not include tactile sensors. Hence, performing in-hand object recognition with direct touch sensing is not feasible. Adding tactile sensors can complicate the hardware and introduce extra costs to the robotic hand. Also, the common approach of visual perception may not be available due to occlusions. In this paper, we explore whether kinesthetic haptics can provide in-direct information regarding the geometry of a grasped object during in-hand manipulation with an underactuated hand. By solely sensing actuator positions and torques over a period of time during motion, we show that a classifier can recognize an object from a set of trained ones with a high success rate of almost 95%. In addition, the implementation of a real-time majority vote during manipulation further improves recognition. Additionally, a trained classifier is also shown to be successful in distinguishing between shape categories rather than just specific objects.
Robotic arms are highly common in various automation processes such as manufacturing lines. However, these highly capable robots are usually degraded to simple repetitive tasks such as pick-and-place. On the other hand, designing an optimal robot for one specific task consumes large resources of engineering time and costs. In this paper, we propose a novel concept for optimizing the fitness of a robotic arm to perform a specific task based on human demonstration. Fitness of a robot arm is a measure of its ability to follow recorded human arm and hand paths. The optimization is conducted using a modified variant of the Particle Swarm Optimization for the robot design problem. In the proposed approach, we generate an optimal robot design along with the required path to complete the task. The approach could reduce the time-to-market of robotic arms and enable the standardization of modular robotic parts. Novice users could easily apply a minimal robot arm to various tasks. Two test cases of common manufacturing tasks are presented yielding optimal designs and reduced computational effort by up to 92%.
Robotic hands are an important tool for replacing humans in handling toxic or radioactive materials. However, these are usually highly expensive, and in many cases, once they are contaminated, they cannot be re-used. Some solutions cope with this challenge by 3D printing parts of a tendon-based hand. However, fabrication requires additional assembly steps. Therefore, a novice user may have difficulties fabricating a hand upon contamination of the previous one. We propose the Print-N-Grip (PNG) hand which is a tendon-based underactuated mechanism able to adapt to the shape of objects. The hand is fabricated through one-shot 3D printing with no additional engineering effort, and can accommodate a number of fingers as desired by the practitioner. Due to its low cost, the PNG hand can easily be detached from a universal base for disposing upon contamination, and replaced by a newly printed one. In addition, the PNG hand is scalable such that one can effortlessly resize the computerized model and print. We present the design of the PNG hand along with experiments to show the capabilities and high durability of the hand.
Human dexterity is an invaluable capability for precise manipulation of objects in complex tasks. The capability of robots to similarly grasp and perform in-hand manipulation of objects is critical for their use in the ever changing human environment, and for their ability to replace manpower. In recent decades, significant effort has been put in order to enable in-hand manipulation capabilities to robotic systems. Initial robotic manipulators followed carefully programmed paths, while later attempts provided a solution based on analytical modeling of motion and contact. However, these have failed to provide practical solutions due to inability to cope with complex environments and uncertainties. Therefore, the effort has shifted to learning-based approaches where data is collected from the real world or through a simulation, during repeated attempts to complete various tasks. The vast majority of learning approaches focused on learning data-based models that describe the system to some extent or Reinforcement Learning (RL). RL, in particular, has seen growing interest due to the remarkable ability to generate solutions to problems with minimal human guidance. In this survey paper, we track the developments of learning approaches for in-hand manipulations and, explore the challenges and opportunities. This survey is designed both as an introduction for novices in the field with a glossary of terms as well as a guide of novel advances for advanced practitioners.
Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human-Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 meters and in the context of HRI. We propose a novel deep-learning framework for URGR using solely a simple RGB camera. First, a novel super-resolution model termed HQ-Net is used to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments.
Simulating tactile perception could potentially leverage the learning capabilities of robotic systems in manipulation tasks. However, the reality gap of simulators for high-resolution tactile sensors remains large. Models trained on simulated data often fail in zero-shot inference and require fine-tuning with real data. In addition, work on high-resolution sensors commonly focus on ones with flat surfaces while 3D round sensors are essential for dexterous manipulation. In this paper, we propose a bi-directional Generative Adversarial Network (GAN) termed SightGAN. SightGAN relies on the early CycleGAN while including two additional loss components aimed to accurately reconstruct background and contact patterns including small contact traces. The proposed SightGAN learns real-to-sim and sim-to-real processes over difference images. It is shown to generate real-like synthetic images while maintaining accurate contact positioning. The generated images can be used to train zero-shot models for newly fabricated sensors. Consequently, the resulted sim-to-real generator could be built on top of the tactile simulator to provide a real-world framework. Potentially, the framework can be used to train, for instance, reinforcement learning policies of manipulation tasks. The proposed model is verified in extensive experiments with test data collected from real sensors and also shown to maintain embedded force information within the tactile images.
Many tasks performed by two humans require mutual interaction between arms such as handing-over tools and objects. In order for a robotic arm to interact with a human in the same way, it must reason about the location of the human arm in real-time. Furthermore and to acquire interaction in a timely manner, the robot must be able predict the final target of the human in order to plan and initiate motion beforehand. In this paper, we explore the use of a low-cost wearable device equipped with two inertial measurement units (IMU) for learning reaching motion for real-time applications of Human-Robot Collaboration (HRC). A wearable device can replace or be complementary to visual perception in cases of bad lighting or occlusions in a cluttered environment. We first train a neural-network model to estimate the current location of the arm. Then, we propose a novel model based on a recurrent neural-network to predict the future target of the human arm during motion in real-time. Early prediction of the target grants the robot with sufficient time to plan and initiate motion during the motion of the human. The accuracies of the models are analyzed concerning the features included in the motion representation. Through experiments and real demonstrations with a robotic arm, we show that sufficient accuracy is achieved for feasible HRC without any visual perception. Once trained, the system can be deployed in various spaces with no additional effort. The models exhibit high accuracy for various initial poses of the human arm. Moreover, the trained models are shown to provide high success rates with additional human participants not included in the model training.
Teleoperation enables a user to perform tasks from a remote location. Hence, the user can interact with a long-distance environment through the operation of a robotic system. Often, teleoperation is required in order to perform dangerous tasks (e.g., work in disaster zones or in chemical plants) while keeping the user out of harm's way. Nevertheless, common approaches often provide cumbersome and unnatural usage. In this letter, we propose TeleFMG, an approach for teleoperation of a multi-finger robotic hand through natural motions of the user's hand. By using a low-cost wearable Force-Myography (FMG) device, musculoskeletal activities on the user's forearm are mapped to hand poses which, in turn, are mimicked by a robotic hand. The mapping is performed by a data-based model that considers spatial positions of the sensors on the forearm along with temporal dependencies of the FMG signals. A set of experiments show the ability of a teleoperator to control a multi-finger hand through intuitive and natural finger motion. Furthermore, transfer to new users is demonstrated.
Tactile sensing is a necessary capability for a robotic hand to perform fine manipulations and interact with the environment. Optical sensors are a promising solution for high-resolution contact estimation. Nevertheless, they are usually not easy to fabricate and require individual calibration in order to acquire sufficient accuracy. In this letter, we propose AllSight, an optical tactile sensor with a round 3D structure potentially designed for robotic in-hand manipulation tasks. AllSight is mostly 3D printed making it low-cost, modular, durable and in the size of a human thumb while with a large contact surface. We show the ability of AllSight to learn and estimate a full contact state, i.e., contact position, forces and torsion. With that, an experimental benchmark between various configurations of illumination and contact elastomers are provided. Furthermore, the robust design of AllSight provides it with a unique zero-shot capability such that a practitioner can fabricate the open-source design and have a ready-to-use state estimation model. A set of experiments demonstrates the accurate state estimation performance of AllSight.
In communication between humans, gestures are often preferred or complementary to verbal expression since the former offers better spatial referral. Finger pointing gesture conveys vital information regarding some point of interest in the environment. In human-robot interaction, a user can easily direct a robot to a target location, for example, in search and rescue or factory assistance. State-of-the-art approaches for visual pointing estimation often rely on depth cameras, are limited to indoor environments and provide discrete predictions between limited targets. In this paper, we explore the learning of models for robots to understand pointing directives in various indoor and outdoor environments solely based on a single RGB camera. A novel framework is proposed which includes a designated model termed PointingNet. PointingNet recognizes the occurrence of pointing followed by approximating the position and direction of the index finger. The model relies on a novel segmentation model for masking any lifted arm. While state-of-the-art human pose estimation models provide poor pointing angle estimation accuracy of 28deg, PointingNet exhibits mean accuracy of less than 2deg. With the pointing information, the target is computed followed by planning and motion of the robot. The framework is evaluated on two robotic systems yielding accurate target reaching.