Object Goal Navigation (ObjectNav) task is to navigate an agent to an object instance in unseen environments. The traditional navigation paradigm plans the shortest path on a pre-built map. Inspired by this, we propose an object goal navigation framework, which could directly perform path planning based on an estimated distance map. Specifically, our model takes a birds-eye-view semantic map as input, and estimates the distance from the map cells to the target object based on the learned prior knowledge. With the estimated distance map, the agent could explore the environment and navigate to the target objects based on either human-designed or learned navigation policy. Empirical results in visually realistic simulation environments show that the proposed method outperforms a wide range of baselines on success rate and efficiency.
Despite the impressive progress achieved in robust grasp detection, robots are not skilled in sophisticated grasping tasks (e.g. search and grasp a specific object in clutter). Such tasks involve not only grasping, but comprehensive perception of the visual world (e.g. the relationship between objects). Recently, the advanced deep learning techniques provide a promising way for understanding the high-level visual concepts. It encourages robotic researchers to explore solutions for such hard and complicated fields. However, deep learning usually means data-hungry. The lack of data severely limits the performance of deep-learning-based algorithms. In this paper, we present a new dataset named \regrad to sustain the modeling of relationships among objects and grasps. We collect the annotations of object poses, segmentations, grasps, and relationships in each image for comprehensive perception of grasping. Our dataset is collected in both forms of 2D images and 3D point clouds. Moreover, since all the data are generated automatically, users are free to import their own object models for the generation of as many data as they want. We have released our dataset and codes. A video that demonstrates the process of data generation is also available.
Learning a robust representation of robotic grasping from point clouds is a crucial but challenging task. In this paper, we propose an end-to-end single-shot grasp detection network taking one single-view point cloud as input for parallel grippers. Our network includes three stages: Score Network (SN), Grasp Region Network (GRN) and Refine Network (RN). Specifically, SN is designed to select positive points with high grasp confidence. GRN coarsely generates a set of grasp proposals on selected positive points. Finally, RN refines the detected grasps based on local grasp features. To further improve the performance, we propose a grasp anchor mechanism, in which grasp anchors are introduced to generate grasp proposal. Moreover, we contribute a large-scale grasp dataset without manual annotation based on the YCB dataset. Experiments show that our method significantly outperforms several successful point-cloud based grasp detection methods including GPD, PointnetGPD, as well as S$^4$G.