Many complex robotic manipulation tasks can be decomposed as a sequence of pick and place actions. Training a robotic agent to learn this sequence over many different starting conditions typically requires many iterations or demonstrations, especially in 3D environments. In this work, we propose Fourier Transporter (\ours{}) which leverages the two-fold $\SE(d)\times\SE(d)$ symmetry in the pick-place problem to achieve much higher sample efficiency. \ours{} is an open-loop behavior cloning method trained using expert demonstrations to predict pick-place actions on new environments. \ours{} is constrained to incorporate symmetries of the pick and place actions independently. Our method utilizes a fiber space Fourier transformation that allows for memory-efficient construction. We test our proposed network on the RLbench benchmark and achieve state-of-the-art results across various tasks.
Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.
Given point cloud input, the problem of 6-DoF grasp pose detection is to identify a set of hand poses in SE(3) from which an object can be successfully grasped. This important problem has many practical applications. Here we propose a novel method and neural network model that enables better grasp success rates relative to what is available in the literature. The method takes standard point cloud data as input and works well with single-view point clouds observed from arbitrary viewing directions.
Digital pathological analysis is run as the main examination used for cancer diagnosis. Recently, deep learning-driven feature extraction from pathology images is able to detect genetic variations and tumor environment, but few studies focus on differential gene expression in tumor cells. In this paper, we propose a self-supervised contrastive learning framework, HistCode, to infer differential gene expressions from whole slide images (WSIs). We leveraged contrastive learning on large-scale unannotated WSIs to derive slide-level histopathological feature in latent space, and then transfer it to tumor diagnosis and prediction of differentially expressed cancer driver genes. Our extensive experiments showed that our method outperformed other state-of-the-art models in tumor diagnosis tasks, and also effectively predicted differential gene expressions. Interestingly, we found the higher fold-changed genes can be more precisely predicted. To intuitively illustrate the ability to extract informative features from pathological images, we spatially visualized the WSIs colored by the attentive scores of image tiles. We found that the tumor and necrosis areas were highly consistent with the annotations of experienced pathologists. Moreover, the spatial heatmap generated by lymphocyte-specific gene expression patterns was also consistent with the manually labeled WSI.
Transporter Net is a recently proposed framework for pick and place that is able to learn good manipulation policies from a very few expert demonstrations. A key reason why Transporter Net is so sample efficient is that the model incorporates rotational equivariance into the pick module, i.e. the model immediately generalizes learned pick knowledge to objects presented in different orientations. This paper proposes a novel version of Transporter Net that is equivariant to both pick and place orientation. As a result, our model immediately generalizes place knowledge to different place orientations in addition to generalizing pick knowledge as before. Ultimately, our new model is more sample efficient and achieves better pick and place success rates than the baseline Transporter Net model.
Shape completion, the problem of inferring the complete geometry of an object given a partial point cloud, is an important problem in robotics and computer vision. This paper proposes the Graph Attention Shape Completion Network (GASCN), a novel neural network model that solves this problem. This model combines a graph-based model for encoding local point cloud information with an MLP-based architecture for encoding global information. For each completed point, our model infers the normal and extent of the local surface patch which is used to produce dense yet precise shape completions. We report experiments that demonstrate that GASCN outperforms standard shape completion methods on a standard benchmark drawn from the Shapenet dataset.
Quality-relevant fault detection plays an important role in industrial processes, while the current quality-related fault detection methods based on neural networks main concentrate on process-relevant variables and ignore quality-relevant variables, which restrict the application of process monitoring. Therefore, in this paper, a fault detection scheme based on the improved teacher-student network is proposed for quality-relevant fault detection. In the traditional teacher-student network, as the features differences between the teacher network and the student network will cause performance degradation on the student network, representation evaluation block (REB) is proposed to quantify the features differences between the teacher and the student networks, and uncertainty modeling is used to add this difference in modeling process, which are beneficial to reduce the features differences and improve the performance of the student network. Accordingly, REB and uncertainty modeling is applied in the teacher-student network named as uncertainty modeling teacher-student uncertainty autoencoder (TSUAE). Then, the proposed TSUAE is applied to process monitoring, which can effectively detect faults in the process-relevant subspace and quality-relevant subspace simultaneously. The proposed TSUAE-based fault detection method is verified in two simulation experiments illustrating that it has satisfactory fault detection performance compared to other fault detection methods.