Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fuchun Sun

Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

Sep 20, 2021

Xinzhu Liu, Di Guo, Huaping Liu, Fuchun Sun

Figure 1 for Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

Figure 2 for Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

Figure 3 for Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

Figure 4 for Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

Abstract:In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given. It is a meaningful task inspiring a surge of relevant research. However, most of the existing models are only effective for single-agent navigation, and a single agent has low efficiency and poor fault tolerance when completing more complicated tasks. Multi-agent collaboration can improve the efficiency and has strong application potentials. In this paper, we propose the multi-agent visual semantic navigation, in which multiple agents collaborate with others to find multiple target objects. It is a challenging task that requires agents to learn reasonable collaboration strategies to perform efficient exploration under the restrictions of communication bandwidth. We develop a hierarchical decision framework based on semantic mapping, scene prior knowledge, and communication mechanism to solve this task. The results of testing experiments in unseen scenes with both known objects and unknown objects illustrate the higher accuracy and efficiency of the proposed model compared with the single-agent model.

Via

Access Paper or Ask Questions

Knowledge-based Embodied Question Answering

Sep 16, 2021

Sinan Tan, Mengmeng Ge, Di Guo, Huaping Liu, Fuchun Sun

Figure 1 for Knowledge-based Embodied Question Answering

Figure 2 for Knowledge-based Embodied Question Answering

Figure 3 for Knowledge-based Embodied Question Answering

Figure 4 for Knowledge-based Embodied Question Answering

Abstract:In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios.

Via

Access Paper or Ask Questions

Elastic Tactile Simulation Towards Tactile-Visual Perception

Aug 12, 2021

Yikai Wang, Wenbing Huang, Bin Fang, Fuchun Sun, Chang Li

Figure 1 for Elastic Tactile Simulation Towards Tactile-Visual Perception

Figure 2 for Elastic Tactile Simulation Towards Tactile-Visual Perception

Figure 3 for Elastic Tactile Simulation Towards Tactile-Visual Perception

Figure 4 for Elastic Tactile Simulation Towards Tactile-Visual Perception

Abstract:Tactile sensing plays an important role in robotic perception and manipulation tasks. To overcome the real-world limitations of data collection, simulating tactile response in a virtual environment comes as a desirable direction of robotic research. In this paper, we propose Elastic Interaction of Particles (EIP) for tactile simulation. Most existing works model the tactile sensor as a rigid multi-body, which is incapable of reflecting the elastic property of the tactile sensor as well as characterizing the fine-grained physical interaction between the two objects. By contrast, EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact. With the tactile simulation by EIP, we further propose a tactile-visual perception network that enables information fusion between tactile data and visual images. The perception network is based on a global-to-local fusion mechanism where multi-scale tactile features are aggregated to the corresponding local region of the visual modality with the guidance of tactile positions and directions. The fusion method exhibits superiority regarding the 3D geometric reconstruction task.

* ACMMM 2021 (Oral). Code available at https://github.com/yikaiw/EIP

Via

Access Paper or Ask Questions

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Aug 11, 2021

Yikai Wang, Fuchun Sun, Ming Lu, Anbang Yao

Figure 1 for Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Figure 2 for Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Figure 3 for Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Figure 4 for Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Abstract:We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network. The framework consists of two innovative fusion schemes. Firstly, unlike existing multimodal methods that necessitate individual encoders for different modalities, we verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder, which also enables implicit fusion via joint feature representation learning. Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively. To take advantage of such scheme, we introduce two asymmetric fusion operations including channel shuffle and pixel shift, which learn different fused features with respect to different fusion directions. These two operations are parameter-free and strengthen the multimodal feature interactions across channels as well as enhance the spatial feature discrimination within channels. We conduct extensive experiments on semantic segmentation and image translation tasks, based on three publicly available datasets covering diverse modalities. Results indicate that our proposed framework is general, compact and is superior to state-of-the-art fusion frameworks.

* ACMMM 2020 (2020.3)

Via

Access Paper or Ask Questions

TransSC: Transformer-based Shape Completion for Grasp Evaluation

Jul 01, 2021

Wenkai Chen, Hongzhuo Liang, Zhaopeng Chen, Fuchun Sun, Jianwei Zhang

Figure 1 for TransSC: Transformer-based Shape Completion for Grasp Evaluation

Figure 2 for TransSC: Transformer-based Shape Completion for Grasp Evaluation

Figure 3 for TransSC: Transformer-based Shape Completion for Grasp Evaluation

Figure 4 for TransSC: Transformer-based Shape Completion for Grasp Evaluation

Abstract:Currently, robotic grasping methods based on sparse partial point clouds have attained a great grasping performance on various objects while they often generate wrong grasping candidates due to the lack of geometric information on the object. In this work, we propose a novel and robust shape completion model (TransSC). This model has a transformer-based encoder to explore more point-wise features and a manifold-based decoder to exploit more object details using a partial point cloud as input. Quantitative experiments verify the effectiveness of the proposed shape completion network and demonstrate it outperforms existing methods. Besides, TransSC is integrated into a grasp evaluation network to generate a set of grasp candidates. The simulation experiment shows that TransSC improves the grasping generation result compared to the existing shape completion baselines. Furthermore, our robotic experiment shows that with TransSC the robot is more successful in grasping objects that are randomly placed on a support surface.

Via

Access Paper or Ask Questions

Adversarial Option-Aware Hierarchical Imitation Learning

Jun 11, 2021

Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei Li

Figure 1 for Adversarial Option-Aware Hierarchical Imitation Learning

Figure 2 for Adversarial Option-Aware Hierarchical Imitation Learning

Figure 3 for Adversarial Option-Aware Hierarchical Imitation Learning

Figure 4 for Adversarial Option-Aware Hierarchical Imitation Learning

Abstract:It has been a challenge to learning skills for an agent from long-horizon unannotated demonstrations. Existing approaches like Hierarchical Imitation Learning(HIL) are prone to compounding errors or suboptimal solutions. In this paper, we propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. In particular, we propose an Expectation-Maximization(EM)-style algorithm: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the newly proposed option-occupancy measurement between the expert and the agent. We theoretically prove the convergence of the proposed algorithm. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.

* accepted by ICML 2021

Via

Access Paper or Ask Questions

A Robust Tube-Based Smooth-MPC for Robot Manipulator Planning

Mar 17, 2021

Yu Luo, Mingxuan Jing, Tianying Ji, Fuchun Sun, Huaping Liu

Figure 1 for A Robust Tube-Based Smooth-MPC for Robot Manipulator Planning

Figure 2 for A Robust Tube-Based Smooth-MPC for Robot Manipulator Planning

Figure 3 for A Robust Tube-Based Smooth-MPC for Robot Manipulator Planning

Abstract:Model Predictive Control (MPC) has shown the great performance of target optimization and constraint satisfaction. However, the heavy computation of the Optimal Control Problem (OCP) at each triggering instant brings the serious delay from state sampling to the control signals, which limits the applications of MPC in resource-limited robot manipulator systems over complicated tasks. In this paper, we propose a novel robust tube-based smooth-MPC strategy for nonlinear robot manipulator planning systems with disturbances and constraints. Based on piecewise linearization and state prediction, our control strategy improves the smoothness and optimizes the delay of the control process. By deducing the deviation of the real system states and the nominal system states, we can predict the next real state set at the current instant. And by using this state set as the initial condition, we can solve the next OCP ahead and store the optimal controls based on the nominal system states, which eliminates the delay. Furthermore, we linearize the nonlinear system with a given upper bound of error, reducing the complexity of the OCP and improving the response speed. Based on the theoretical framework of tube MPC, we prove that the control strategy is recursively feasible and closed-loop stable with the constraints and disturbances. Numerical simulations have verified the efficacy of the designed approach compared with the conventional MPC.

Via

Access Paper or Ask Questions

Fault-Aware Robust Control via Adversarial Reinforcement Learning

Nov 30, 2020

Fan Yang, Chao Yang, Di Guo, Huaping Liu, Fuchun Sun

Figure 1 for Fault-Aware Robust Control via Adversarial Reinforcement Learning

Figure 2 for Fault-Aware Robust Control via Adversarial Reinforcement Learning

Figure 3 for Fault-Aware Robust Control via Adversarial Reinforcement Learning

Figure 4 for Fault-Aware Robust Control via Adversarial Reinforcement Learning

Abstract:Robots have limited adaptation ability compared to humans and animals in the case of damage. However, robot damages are prevalent in real-world applications, especially for robots deployed in extreme environments. The fragility of robots greatly limits their widespread application. We propose an adversarial reinforcement learning framework, which significantly increases robot robustness over joint damage cases in both manipulation tasks and locomotion tasks. The agent is trained iteratively under the joint damage cases where it has poor performance. We validate our algorithm on a three-fingered robot hand and a quadruped robot. Our algorithm can be trained only in simulation and directly deployed on a real robot without any fine-tuning. It also demonstrates exceeding success rates over arbitrary joint damage cases.

* 7 pages, 7 figures

Via

Access Paper or Ask Questions

Elastic Interaction of Particles for Robotic Tactile Simulation

Nov 23, 2020

Yikai Wang, Wenbing Huang, Bin Fang, Fuchun Sun

Figure 1 for Elastic Interaction of Particles for Robotic Tactile Simulation

Figure 2 for Elastic Interaction of Particles for Robotic Tactile Simulation

Figure 3 for Elastic Interaction of Particles for Robotic Tactile Simulation

Figure 4 for Elastic Interaction of Particles for Robotic Tactile Simulation

Abstract:Tactile sensing plays an important role in robotic perception and manipulation. To overcome the real-world limitations of data collection, simulating tactile response in virtual environment comes as a desire direction of robotic research. Most existing works model the tactile sensor as a rigid multi-body, which is incapable of reflecting the elastic property of the tactile sensor as well as characterizing the fine-grained physical interaction between two objects. In this paper, we propose Elastic Interaction of Particles (EIP), a novel framework for tactile emulation. At its core, EIP models the tactile sensor as a group of coordinated particles, and the elastic theory is applied to regulate the deformation of particles during the contact process. The implementation of EIP is conducted from scratch, without resorting to any existing physics engine. Experiments to verify the effectiveness of our method have been carried out on two applications: robotic perception with tactile data and 3D geometric reconstruction by tactile-visual fusion. It is possible to open up a new vein for robotic tactile simulation, and contribute to various downstream robotic tasks.

Via

Access Paper or Ask Questions

Deep Multimodal Fusion by Channel Exchanging

Nov 10, 2020

Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, Junzhou Huang

Figure 1 for Deep Multimodal Fusion by Channel Exchanging

Figure 2 for Deep Multimodal Fusion by Channel Exchanging

Figure 3 for Deep Multimodal Fusion by Channel Exchanging

Figure 4 for Deep Multimodal Fusion by Channel Exchanging

Abstract:Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. Our code is available at https://github.com/yikaiw/CEN.

* NeurIPS 2020. Code and models: https://github.com/yikaiw/CEN

Via

Access Paper or Ask Questions