Alert button
Picture for Chongkun Xia

Chongkun Xia

Alert button

Learning Language-Conditioned Deformable Object Manipulation with Graph Dynamics

Mar 02, 2023
Kai Mo, Yuhong Deng, Chongkun Xia, Xueqian Wang

Figure 1 for Learning Language-Conditioned Deformable Object Manipulation with Graph Dynamics
Figure 2 for Learning Language-Conditioned Deformable Object Manipulation with Graph Dynamics
Figure 3 for Learning Language-Conditioned Deformable Object Manipulation with Graph Dynamics
Figure 4 for Learning Language-Conditioned Deformable Object Manipulation with Graph Dynamics

Vision-based deformable object manipulation is a challenging problem in robotic manipulation, requiring a robot to infer a sequence of manipulation actions leading to the desired state from solely visual observations. Most previous works address this problem in a goal-conditioned way and adapt the goal image to specify a task, which is not practical or efficient. Thus, we adapted natural language specification and proposed a language-conditioned deformable object manipulation policy learning framework. We first design a unified Transformer-based architecture to understand multi-modal data and output picking and placing action. Besides, we have introduced the visible connectivity graph to tackle nonlinear dynamics and complex configuration of the deformable object in the manipulation process. Both simulated and real experiments have demonstrated that the proposed method is general and effective in language-conditioned deformable object manipulation policy learning. Our method achieves much higher success rates on various language-conditioned deformable object manipulation tasks (87.3% on average) than the state-of-the-art method in simulation experiments. Besides, our method is much lighter and has a 75.6% shorter inference time than state-of-the-art methods. We also demonstrate that our method performs well in real-world applications. Supplementary videos can be found at https://sites.google.com/view/language-deformable.

* submitted to IROS 2023 
Viaarxiv icon

Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging

Feb 21, 2023
Yuhong Deng, Chongkun Xia, Xueqian Wang, Lipeng Chen

Figure 1 for Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging
Figure 2 for Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging
Figure 3 for Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging
Figure 4 for Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging

Object rearranging is one of the most common deformable manipulation tasks, where the robot needs to rearrange a deformable object into a goal configuration. Previous studies focus on designing an expert system for each specific task by model-based or data-driven approaches and the application scenarios are therefore limited. Some research has been attempting to design a general framework to obtain more advanced manipulation capabilities for deformable rearranging tasks, with lots of progress achieved in simulation. However, transferring from simulation to reality is difficult due to the limitation of the end-to-end CNN architecture. To address these challenges, we design a local GNN (Graph Neural Network) based learning method, which utilizes two representation graphs to encode keypoints detected from images. Self-attention is applied for graph updating and cross-attention is applied for generating manipulation actions. Extensive experiments have been conducted to demonstrate that our framework is effective in multiple 1-D (rope, rope ring) and 2-D (cloth) rearranging tasks in simulation and can be easily transferred to a real robot by fine-tuning a keypoint detector.

* IEEE/RSJ International Conference on Intelligent Robots and Systems 2022 (IROS 2022)  
* has been accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems 2022 
Viaarxiv icon

Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task

Feb 21, 2023
Yuhong Deng, Chongkun Xia, Xueqian Wang, Lipeng Chen

Figure 1 for Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task
Figure 2 for Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task
Figure 3 for Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task
Figure 4 for Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task

Rearranging deformable objects is a long-standing challenge in robotic manipulation for the high dimensionality of configuration space and the complex dynamics of deformable objects. We present a novel framework, Graph-Transporter, for goal-conditioned deformable object rearranging tasks. To tackle the challenge of complex configuration space and dynamics, we represent the configuration space of a deformable object with a graph structure and the graph features are encoded by a graph convolution network. Our framework adopts an architecture based on Fully Convolutional Network (FCN) to output pixel-wise pick-and-place actions from only visual input. Extensive experiments have been conducted to validate the effectiveness of the graph representation of deformable object configuration. The experimental results also demonstrate that our framework is effective and general in handling goal-conditioned deformable object rearranging tasks.

* IEEE International Conference on Systems, Man and Cybernetics 2022 (SMC 2022)  
* has been accepted by IEEE International Conference on Systems, Man and Cybernetics 2022 
Viaarxiv icon

Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention

Jan 08, 2023
Kai Mo, Chongkun Xia, Xueqian Wang, Yuhong Deng, Xuehai Gao, Bin Liang

Figure 1 for Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention
Figure 2 for Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention
Figure 3 for Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention
Figure 4 for Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention

Sequential multi-step cloth manipulation is a challenging problem in robotic manipulation, requiring a robot to perceive the cloth state and plan a sequence of chained actions leading to the desired state. Most previous works address this problem in a goal-conditioned way, and goal observation must be given for each specific task and cloth configuration, which is not practical and efficient. Thus, we present a novel multi-step cloth manipulation planning framework named Foldformer. Foldformer can complete similar tasks with only a general demonstration and utilize a space-time attention mechanism to capture the instruction information behind this demonstration. We experimentally evaluate Foldsformer on four representative sequential multi-step manipulation tasks and show that Foldsformer significantly outperforms state-of-the-art approaches in simulation. Foldformer can complete multi-step cloth manipulation tasks even when configurations of the cloth (e.g., size and pose) vary from configurations in the general demonstrations. Furthermore, our approach can be transferred from simulation to the real world without additional training or domain randomization. Despite training on rectangular clothes, we also show that our approach can generalize to unseen cloth shapes (T-shirts and shorts). Videos and source code are available at: https://sites.google.com/view/foldsformer.

* IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 760-767, Feb. 2023  
* 8 pages, 6 figures, published to IEEE Robotics & Automation Letters (RA-L) 
Viaarxiv icon

Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

Nov 30, 2022
Shoujie Li, Haixin Yu, Wenbo Ding, Houde Liu, Linqi Ye, Chongkun Xia, Xueqian Wang, Xiao-Ping Zhang

Figure 1 for Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds
Figure 2 for Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds
Figure 3 for Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds
Figure 4 for Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

The accurate detection and grasping of transparent objects are challenging but of significance to robots. Here, a visual-tactile fusion framework for transparent object grasping under complex backgrounds and variant light conditions is proposed, including the grasping position detection, tactile calibration, and visual-tactile fusion based classification. First, a multi-scene synthetic grasping dataset generation method with a Gaussian distribution based data annotation is proposed. Besides, a novel grasping network named TGCNN is proposed for grasping position detection, showing good results in both synthetic and real scenes. In tactile calibration, inspired by human grasping, a fully convolutional network based tactile feature extraction method and a central location based adaptive grasping strategy are designed, improving the success rate by 36.7% compared to direct grasping. Furthermore, a visual-tactile fusion method is proposed for transparent objects classification, which improves the classification accuracy by 34%. The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.

Viaarxiv icon

Polarimetric Inverse Rendering for Transparent Shapes Reconstruction

Aug 25, 2022
Mingqi Shao, Chongkun Xia, Dongxu Duan, Xueqian Wang

Figure 1 for Polarimetric Inverse Rendering for Transparent Shapes Reconstruction
Figure 2 for Polarimetric Inverse Rendering for Transparent Shapes Reconstruction
Figure 3 for Polarimetric Inverse Rendering for Transparent Shapes Reconstruction
Figure 4 for Polarimetric Inverse Rendering for Transparent Shapes Reconstruction

In this work, we propose a novel method for the detailed reconstruction of transparent objects by exploiting polarimetric cues. Most of the existing methods usually lack sufficient constraints and suffer from the over-smooth problem. Hence, we introduce polarization information as a complementary cue. We implicitly represent the object's geometry as a neural network, while the polarization render is capable of rendering the object's polarization images from the given shape and illumination configuration. Direct comparison of the rendered polarization images to the real-world captured images will have additional errors due to the transmission in the transparent object. To address this issue, the concept of reflection percentage which represents the proportion of the reflection component is introduced. The reflection percentage is calculated by a ray tracer and then used for weighting the polarization loss. We build a polarization dataset for multi-view transparent shapes reconstruction to verify our method. The experimental results show that our method is capable of recovering detailed shapes and improving the reconstruction quality of transparent objects. Our dataset and code will be publicly available at https://github.com/shaomq2187/TransPIR.

Viaarxiv icon

Transparent Shape from Single Polarization Images

Apr 20, 2022
Mingqi Shao, Chongkun Xia, Zhendong Yang, Junnan Huang, Xueqian Wang

Figure 1 for Transparent Shape from Single Polarization Images
Figure 2 for Transparent Shape from Single Polarization Images
Figure 3 for Transparent Shape from Single Polarization Images
Figure 4 for Transparent Shape from Single Polarization Images

This paper presents a data-driven approach for transparent shape from polarization. Due to the inherent high transmittance, the previous shape from polarization(SfP) methods based on specular reflection model have difficulty in estimating transparent shape, and the lack of datasets for transparent SfP also limits the application of the data-driven approach. Hence, we construct the transparent SfP dataset which consists of both synthetic and real-world datasets. To determine the reliability of the physics-based reflection model, we define the physics-based prior confidence by exploiting the inherent fault of polarization information, then we propose a multi-branch fusion network to embed the confidence. Experimental results show that our approach outperforms other SfP methods. Compared with the previous method, the mean and median angular error of our approach are reduced from $19.00^\circ$ and $14.91^\circ$ to $16.72^\circ$ and $13.36^\circ$, and the accuracy $11.25^\circ, 22.5^\circ, 30^\circ$ are improved from $38.36\%, 77.36\%, 87.48\%$ to $45.51\%, 78.86\%, 89.98\%$, respectively.

Viaarxiv icon