Inferring causal directions on discrete and categorical data is an important yet challenging problem. Even though the additive noise models (ANMs) approach can be adapted to the discrete data, the functional structure assumptions make it not applicable on categorical data. Inspired by the principle that the cause and mechanism are independent, various methods have been developed, leveraging independence tests such as the distance correlation measure. In this work, we take an alternative perspective and propose a subsampling-based method to test the independence between the generating schemes of the cause and that of the mechanism. Our methodology works for both discrete and categorical data and does not imply any functional model on the data, making it a more flexible approach. To demonstrate the efficacy of our methodology, we compare it with existing baselines over various synthetic data and real data experiments.
Inferring causal directions on discrete and categorical data is an important yet challenging problem. Even though the additive noise models (ANMs) approach can be adapted to the discrete data, the functional structure assumptions make it not applicable on categorical data. Inspired by the principle that the cause and mechanism are independent, various methods have been developed, leveraging independence tests such as the distance correlation measure. In this work, we take an alternative perspective and propose a subsampling-based method to test the independence between the generating schemes of the cause and that of the mechanism. Our methodology works for both discrete and categorical data and does not imply any functional model on the data, making it a more flexible approach. To demonstrate the efficacy of our methodology, we compare it with existing baselines over various synthetic data and real data experiments.
6D grasping in cluttered scenes is a longstanding robotic manipulation problem. Open-loop manipulation pipelines can fail due to modularity and error sensitivity while most end-to-end grasping policies with raw perception inputs have not yet scaled to complex scenes with obstacles. In this work, we propose a new method to close the gap through sampling and selecting plans in the latent space. Our hierarchical framework learns collision-free target-driven grasping based on partial point cloud observations. Our method learns an embedding space to represent expert grasping plans and a variational autoencoder to sample diverse latent plans at inference time. Furthermore, we train a latent plan critic for plan selection and an option classifier for switching to an instance grasping policy through hierarchical reinforcement learning. We evaluate and analyze our method and compare against several baselines in simulation, and demonstrate that the latent planning can generalize to the real-world cluttered-scene grasping task. Our videos and code can be found at https://sites.google.com/view/latent-grasping .
Segmenting unseen object instances in cluttered environments is an important capability that robots need when functioning in unstructured environments. While previous methods have exhibited promising results, they still tend to provide incorrect results in highly cluttered scenes. We postulate that a network architecture that encodes relations between objects at a high-level can be beneficial. Thus, in this work, we propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks. We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the perturbed segmentations. Our proposed method is orthogonal to previous works and achieves state-of-the-art performance when combined with them. We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes. Code, models, and video can be found at https://github.com/chrisdxie/rice .
We introduce DexYCB, a new dataset for capturing hand grasping of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating safe robot grasps in human-to-robot object handover. Dataset and code are available at https://dex-ycb.github.io.
Majority of the perception methods in robotics require depth information provided by RGB-D cameras. However, standard 3D sensors fail to capture depth of transparent objects due to refraction and absorption of light. In this paper, we introduce a new approach for depth completion of transparent objects from a single RGB-D image. Key to our approach is a local implicit neural representation built on ray-voxel pairs that allows our method to generalize to unseen objects and achieve fast inference speed. Based on this representation, we present a novel framework that can complete missing depth given noisy RGB-D input. We further improve the depth estimation iteratively using a self-correcting refinement model. To train the whole pipeline, we build a large scale synthetic dataset with transparent objects. Experiments demonstrate that our method performs significantly better than the current state-of-the-art methods on both synthetic and real world data. In addition, our approach improves the inference speed by a factor of 20 compared to the previous best method, ClearGrasp. Code and dataset will be released at https://research.nvidia.com/publication/2021-03_RGB-D-Local-Implicit.
Learning high-level navigation behaviors has important implications: it enables robots to build compact visual memory for repeating demonstrations and to build sparse topological maps for planning in novel environments. Existing approaches only learn discrete, short-horizon behaviors. These standalone behaviors usually assume a discrete action space with simple robot dynamics, thus they cannot capture the intricacy and complexity of real-world trajectories. To this end, we propose Composable Behavior Embedding (CBE), a continuous behavior representation for long-horizon visual navigation. CBE is learned in an end-to-end fashion; it effectively captures path geometry and is robust to unseen obstacles. We show that CBE can be used to performing memory-efficient path following and topological mapping, saving more than an order of magnitude of memory than behavior-less approaches.
Causal inference using the restricted structural causal model framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms. For linear models and additive noise models, the asymmetry arises from non-Gaussianity or non-linearity, respectively. This methodology can be adapted to stationary time series, however, inferring causal relationships from non-stationary time series remains a challenging task. In the work, we focus on slowly-varying nonstationary processes and propose to break the symmetry by exploiting the nonstationarity of the data. Our main theoretical result shows that causal direction is identifiable in generic cases when cause and effect are connected via a time-varying filter. We propose a causal discovery procedure by leveraging powerful estimates of the bivariate evolutionary spectra. Both synthetic and real-world data simulations that involve high-order and nonsmooth filters are provided to demonstrate the effectiveness of our proposed methodology.
The quality of datasets is one of the key factors that affect the accuracy of aerodynamic data models. For example, in the uniformly sampled Burgers' dataset, the insufficient high-speed data is overwhelmed by massive low-speed data. Predicting high-speed data is more difficult than predicting low-speed data, owing to that the number of high-speed data is limited, i.e. the quality of the Burgers' dataset is not satisfactory. To improve the quality of datasets, traditional methods usually employ the data resampling technology to produce enough data for the insufficient parts in the original datasets before modeling, which increases computational costs. Recently, the mixtures of experts have been used in natural language processing to deal with different parts of sentences, which provides a solution for eliminating the need for data resampling in aerodynamic data modeling. Motivated by this, we propose the multi-task learning (MTL), a datasets quality-adaptive learning scheme, which combines task allocation and aerodynamic characteristics learning together to disperse the pressure of the entire learning task. The task allocation divides a whole learning task into several independent subtasks, while the aerodynamic characteristics learning learns these subtasks simultaneously to achieve better precision. Two experiments with poor quality datasets are conducted to verify the data quality-adaptivity of the MTL to datasets. The results show than the MTL is more accurate than FCNs and GANs in poor quality datasets.
6D robotic grasping beyond top-down bin-picking scenarios is a challenging task. Previous solutions based on 6D grasp synthesis with robot motion planning usually operate in an open-loop setting without considering the dynamics and contacts of objects, which makes them sensitive to grasp synthesis errors. In this work, we propose a novel method for learning closed-loop control policies for 6D robotic grasping using point clouds from an egocentric camera. We combine imitation learning and reinforcement learning in order to grasp unseen objects and handle the continuous 6D action space, where expert demonstrations are obtained from a joint motion and grasp planner. We introduce a goal-auxiliary actor-critic algorithm, which uses grasping goal prediction as an auxiliary task to facilitate policy learning. The supervision on grasping goals can be obtained from the expert planner for known objects or from hindsight goals for unknown objects. Overall, our learned closed-loop policy achieves over 90% success rates on grasping various ShapeNet objects and YCB objects in the simulation. Our video can be found at https://www.youtube.com/watch?v=rKsCRXLykiY&t=1s .