Recent studies on deep-learning-based small defection segmentation approaches are trained in specific settings and tend to be limited by fixed context. Throughout the training, the network inevitably learns the representation of the background of the training data before figuring out the defection. They underperform in the inference stage once the context changed and can only be solved by training in every new setting. This eventually leads to the limitation in practical robotic applications where contexts keep varying. To cope with this, instead of training a network context by context and hoping it to generalize, why not stop misleading it with any limited context and start training it with pure simulation? In this paper, we propose the network SSDS that learns a way of distinguishing small defections between two images regardless of the context, so that the network can be trained once for all. A small defection detection layer utilizing the pose sensitivity of phase correlation between images is introduced and is followed by an outlier masking layer. The network is trained on randomly generated simulated data with simple shapes and is generalized across the real world. Finally, SSDS is validated on real-world collected data and demonstrates the ability that even when trained in cheap simulation, SSDS can still find small defections in the real world showing the effectiveness and its potential for practical applications.
Ground robots always get collision in that only if they get close to the obstacles, can they sense the danger and take actions, which is usually too late to avoid the crash, causing severe damage to the robots. To address this issue, we present collaboration of aerial and ground robots in recognition of feasible region. Taking the aerial robots' advantages of having large scale variance of view points of the same route which the ground robots is on, the collaboration work provides global information of road segmentation for the ground robot, thus enabling it to obtain feasible region and adjust its pose ahead of time. Under normal circumstance, the transformation between these two devices can be obtained by GPS yet with much error, directly causing inferior influence on recognition of feasible region. Thereby, we utilize the state-of-the-art research achievements in matching heterogeneous sensor measurements called deep phase correlation network(DPCN), which has excellent performance on heterogeneous mapping, to refine the transformation. The network is light-weighted and promising for better generalization. We use Aero-Ground dataset which consists of heterogeneous sensor images and aerial road segmentation images. The results show that our collaborative system has great accuracy, speed and stability.
Place recognition is critical for both offline mapping and online localization. However, current single-sensor based place recognition still remains challenging in adverse conditions. In this paper, a heterogeneous measurements based framework is proposed for long-term place recognition, which retrieves the query radar scans from the existing lidar maps. To achieve this, a deep neural network is built with joint training in the learning stage, and then in the testing stage, shared embeddings of radar and lidar are extracted for heterogeneous place recognition. To validate the effectiveness of the proposed method, we conduct tests and generalization on the multi-session public datasets compared to other competitive methods. The experimental results indicate that our model is able to perform multiple place recognitions: lidar-to-lidar, radar-to-radar and radar-to-lidar, while the learned model is trained only once. We also release the source code publicly.
Object 6D pose estimation is an important research topic in the field of computer vision due to its wide application requirements and the challenges brought by complexity and changes in the real-world. We think fully exploring the characteristics of spatial relationship between points will help to improve the pose estimation performance, especially in the scenes of background clutter and partial occlusion. But this information was usually ignored in previous work using RGB image or RGB-D data. In this paper, we propose a framework for 6D pose estimation from RGB-D data based on spatial structure characteristics of 3D keypoints. We adopt point-wise dense feature embedding to vote for 3D keypoints, which makes full use of the structure information of the rigid body. After the direction vectors pointing to the keypoints are predicted by CNN, we use RANSAC voting to calculate the coordinate of the 3D keypoints, then the pose transformation can be easily obtained by the least square method. In addition, a spatial dimension sampling strategy for points is employed, which makes the method achieve excellent performance on small training sets. The proposed method is verified on two benchmark datasets, LINEMOD and OCCLUSION LINEMOD. The experimental results show that our method outperforms the state-of-the-art approaches, achieves ADD(-S) accuracy of 98.7\% on LINEMOD dataset and 52.6\% on OCCLUSION LINEMOD dataset in real-time.
This work presents an approach for robots to suitably carry out complex applications characterized by the presence of multiple additional constraints or subtasks (e.g. obstacle and self-collision avoidance) but subject to redundancy insufficiency. The proposed approach, based on a novel subtask merging strategy, enforces all subtasks in due course by dynamically modulating a virtual secondary task, where the task status and soft priority are incorporated to improve the overall efficiency of redundancy resolution. The proposed approach greatly improves the redundancy availability by unitizing and deploying subtasks in a fine-grained and compact manner. We build up our control framework on the null space projection, which guarantees the execution of subtasks does not interfere with the primary task. Experimental results on two case studies are presented to show the performance of our approach.
Place recognition is indispensable for drift-free localization system. Due to the variations of the environment, place recognition using single modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extract compound global descriptor from the two modalities, vision and LiDAR. Specifically, we build elevation image generated from point cloud modality as a discriminative structural representation. Based on the 3D information, we derive the correspondences between 3D points and image pixels, by which the pixel-wise visual features can be inserted into the elevation map grids. In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic feature representation with sensible geometry, namely CORAL. Comparisons on the Oxford RobotCar show that CORAL has superior performance against other state-of-the-art methods. We also demonstrate that our network can be generalized to other scenes and sensor configurations using cross-city datasets.
Moving in dynamic pedestrian environments is one of the important requirements for autonomous mobile robots. We present a model-based reinforcement learning approach for robots to navigate through crowded environments. The navigation policy is trained with both real interaction data from multi-agent simulation and virtual data from a deep transition model that predicts the evolution of surrounding dynamics of mobile robots. The model takes laser scan sequence and robot's own state as input and outputs steering control. The laser sequence is further transformed into stacked local obstacle maps disentangled from robot's ego motion to separate the static and dynamic obstacles, simplifying the model training. We observe that our method can be trained with significantly less real interaction data in simulator but achieve similar level of success rate in social navigation task compared with other methods. Experiments were conducted in multiple social scenarios both in simulation and on real robots, the learned policy can guide the robots to the final targets successfully while avoiding pedestrians in a socially compliant manner. Code is available at https://github.com/YuxiangCui/model-based-social-navigation
We aim to develop an efficient programming method for equipping service robots with the skill of performing sign language motions. This paper addresses the problem of transferring complex dual-arm sign language motions characterized by the coordination among arms and hands from human to robot, which is seldom considered in previous studies of motion retargeting techniques. In this paper, we propose a novel motion retargeting method that leverages graph optimization and Dynamic Movement Primitives (DMPs) for this problem. We employ DMPs in a leader-follower manner to parameterize the original trajectories while preserving motion rhythm and relative movements between human body parts, and adopt a three-step optimization procedure to find deformed trajectories for robot motion planning while ensuring feasibility for robot execution. Experimental results of several Chinese Sign Language (CSL) motions have been successfully performed on ABB's YuMi dual-arm collaborative robot (14-DOF) with two 6-DOF Inspire-Robotics' multi-fingered hands, a system with 26 DOFs in total.
Visual localization for planar moving robot is important to various indoor service robotic applications. To handle the textureless areas and frequent human activities in indoor environments, a novel robust visual localization algorithm which leverages dense correspondence and sparse depth for planar moving robot is proposed. The key component is a minimal solution which computes the absolute camera pose with one 3D-2D correspondence and one 2D-2D correspondence. The advantages are obvious in two aspects. First, the robustness is enhanced as the sample set for pose estimation is maximal by utilizing all correspondences with or without depth. Second, no extra effort for dense map construction is required to exploit dense correspondences for handling textureless and repetitive texture scenes. That is meaningful as building a dense map is computational expensive especially in large scale. Moreover, a probabilistic analysis among different solutions is presented and an automatic solution selection mechanism is designed to maximize the success rate by selecting appropriate solutions in different environmental characteristics. Finally, a complete visual localization pipeline considering situations from the perspective of correspondence and depth density is summarized and validated on both simulation and public real-world indoor localization dataset. The code is released on github.
Utilizing the trained model under different conditions without data annotation is attractive for robot applications. Towards this goal, one class of methods is to translate the image style from the training environment to the current one. Conventional studies on image style translation mainly focus on two settings: paired data on images from two domains with exactly aligned content, and unpaired data, with independent content. In this paper, we would like to propose a new setting, where the content in the two images is aligned with error in poses. We consider that this setting is more practical since robots with various sensors are able to align the data up to some error level, even with different styles. To solve this problem, we propose PRoGAN to learn a style translator by intentionally transforming the original domain images with a noisy pose, then matching the distribution of translated transformed images and the distribution of the target domain images. The adversarial training enforces the network to learn the style translation, avoiding being entangled with other variations. In addition, we propose two pose estimation based self-supervised tasks to further improve the performance. Finally, PRoGAN is validated on both simulated and real-world collected data to show the effectiveness. Results on down-stream tasks, classification, road segmentation, object detection, and feature matching show its potential for real applications. https://github.com/wrld/PRoGAN .