Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rong Xiong

Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening

Apr 28, 2023

Hongxiang Yu, Dashun Guo, Zhongxiang Zhou, Yue Wang, Rong Xiong

Figure 1 for Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening

Figure 2 for Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening

Figure 3 for Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening

Figure 4 for Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening

Abstract:This paper introduces a learning-based framework for robot adaptive manipulating the object with a revolute joint in unstructured environments. We concentrate our discussion on various cabinet door opening tasks. To improve the performance of Deep Reinforcement Learning in this scene, we analytically provide an efficient sampling manner utilizing the constraints of the objects. To open various kinds of doors, we add encoded environment parameters that define the various environments to the input of out policy. To transfer the policy into the real world, we train an adaptation module in simulation and fine-tune the adaptation module to cut down the impact of the policy-unaware environment parameters. We design a series of experiments to validate the efficacy of our framework. Additionally, we testify to the model's performance in the real world compared to the traditional door opening method.

Via

Access Paper or Ask Questions

Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Apr 25, 2023

Dongkun Zhang, Jintao Xue, Yuxiang Cui, Yunkai Wang, Eryun Liu, Wei Jing, Junbo Chen, Rong Xiong, Yue Wang

Figure 1 for Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Figure 2 for Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Figure 3 for Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Figure 4 for Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Abstract:Acquiring driving policies that can transfer to unseen environments is challenging when driving in dense traffic flows. The design of traffic flow is essential and previous studies are unable to balance interaction and safety-criticism. To tackle this problem, we propose a socially adversarial traffic flow. We propose a Contextual Partially-Observable Stochastic Game to model traffic flow and assign Social Value Orientation (SVO) as context. We then adopt a two-stage framework. In Stage 1, each agent in our socially-aware traffic flow is driven by a hierarchical policy where upper-level policy communicates genuine SVOs of all agents, which the lower-level policy takes as input. In Stage 2, each agent in the socially adversarial traffic flow is driven by the hierarchical policy where upper-level communicates mistaken SVOs, taken by the lower-level policy trained in Stage 1. Driving policy is adversarially trained through a zero-sum game formulation with upper-level policies, resulting in a policy with enhanced zero-shot transfer capability to unseen traffic flows. Comprehensive experiments on cross-validation verify the superior zero-shot transfer performance of our method.

Via

Access Paper or Ask Questions

A Hyper-network Based End-to-end Visual Servoing with Arbitrary Desired Poses

Apr 18, 2023

Hongxiang Yu, Anzhe Chen, Kechun Xu, Zhongxiang Zhou, Wei Jing, Yue Wang, Rong Xiong

Figure 1 for A Hyper-network Based End-to-end Visual Servoing with Arbitrary Desired Poses

Figure 2 for A Hyper-network Based End-to-end Visual Servoing with Arbitrary Desired Poses

Figure 3 for A Hyper-network Based End-to-end Visual Servoing with Arbitrary Desired Poses

Figure 4 for A Hyper-network Based End-to-end Visual Servoing with Arbitrary Desired Poses

Abstract:Recently, several works achieve end-to-end visual servoing (VS) for robotic manipulation by replacing traditional controller with differentiable neural networks, but lose the ability to servo arbitrary desired poses. This letter proposes a differentiable architecture for arbitrary pose servoing: a hyper-network based neural controller (HPN-NC). To achieve this, HPN-NC consists of a hyper net and a low-level controller, where the hyper net learns to generate the parameters of the low-level controller and the controller uses the 2D keypoints error for control like traditional image-based visual servoing (IBVS). HPN-NC can complete 6 degree of freedom visual servoing with large initial offset. Taking advantage of the fully differentiable nature of HPN-NC, we provide a three-stage training procedure to servo real world objects. With self-supervised end-to-end training, the performance of the integrated model can be further improved in unseen scenes and the amount of manual annotations can be significantly reduced.

Via

Access Paper or Ask Questions

NF-Atlas: Multi-Volume Neural Feature Fields for Large Scale LiDAR Mapping

Apr 10, 2023

Xuan Yu, Yili Liu, Sitong Mao, Shunbo Zhou, Rong Xiong, Yiyi Liao, Yue Wang

Abstract:LiDAR Mapping has been a long-standing problem in robotics. Recent progress in neural implicit representation has brought new opportunities to robotic mapping. In this paper, we propose the multi-volume neural feature fields, called NF-Atlas, which bridge the neural feature volumes with pose graph optimization. By regarding the neural feature volume as pose graph nodes and the relative pose between volumes as pose graph edges, the entire neural feature field becomes both locally rigid and globally elastic. Locally, the neural feature volume employs a sparse feature Octree and a small MLP to encode the submap SDF with an option of semantics. Learning the map using this structure allows for end-to-end solving of maximum a posteriori (MAP) based probabilistic mapping. Globally, the map is built volume by volume independently, avoiding catastrophic forgetting when mapping incrementally. Furthermore, when a loop closure occurs, with the elastic pose graph based representation, only updating the origin of neural volumes is required without remapping. Finally, these functionalities of NF-Atlas are validated. Thanks to the sparsity and the optimization based formulation, NF-Atlas shows competitive performance in terms of accuracy, efficiency and memory usage on both simulation and real-world datasets.

Via

Access Paper or Ask Questions

Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

Apr 06, 2023

Zhixuan Xu, Kechun Xu, Yue Wang, Rong Xiong

Figure 1 for Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

Figure 2 for Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

Figure 3 for Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

Figure 4 for Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

Abstract:We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75% success rate of placement with only ~0.26M trainable parameters. Besides, our method generalizes better to both unseen objects and instructions. Moreover, with only 25% training data, we still outperform the top competing approach.

* 6 pages, 6 figures

Via

Access Paper or Ask Questions

UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields

Mar 28, 2023

Yuanbo Yang, Yifei Yang, Hanlei Guo, Rong Xiong, Yue Wang, Yiyi Liao

Figure 1 for UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields

Figure 2 for UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields

Figure 3 for UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields

Figure 4 for UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields

Abstract:Generating photorealistic images with controllable camera pose and scene contents is essential for many applications including AR/VR and simulation. Despite the fact that rapid progress has been made in 3D-aware generative models, most existing methods focus on object-centric images and are not applicable to generating urban scenes for free camera viewpoint control and scene editing. To address this challenging task, we propose UrbanGIRAFFE, which uses a coarse 3D panoptic prior, including the layout distribution of uncountable stuff and countable objects, to guide a 3D-aware generative model. Our model is compositional and controllable as it breaks down the scene into stuff, objects, and sky. Using stuff prior in the form of semantic voxel grids, we build a conditioned stuff generator that effectively incorporates the coarse semantic and geometry information. The object layout prior further allows us to learn an object generator from cluttered scenes. With proper loss functions, our approach facilitates photorealistic 3D-aware image synthesis with diverse controllability, including large camera movement, stuff editing, and object manipulation. We validate the effectiveness of our model on both synthetic and real-world datasets, including the challenging KITTI-360 dataset.

* Project page: https://lv3d.github.io/urbanGIRAFFE

Via

Access Paper or Ask Questions

GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

Mar 17, 2023

Bingqi Shen, Shuwei Dai, Yuyin Chen, Rong Xiong, Yue Wang, Yanmei Jiao

Figure 1 for GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

Figure 2 for GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

Figure 3 for GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

Figure 4 for GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

Abstract:3D object detection serves as the core basis of the perception tasks in autonomous driving. Recent years have seen the rapid progress of multi-modal fusion strategies for more robust and accurate 3D object detection. However, current researches for robust fusion are all learning-based frameworks, which demand a large amount of training data and are inconvenient to implement in new scenes. In this paper, we propose GOOD, a general optimization-based fusion framework that can achieve satisfying detection without training additional models and is available for any combinations of 2D and 3D detectors to improve the accuracy and robustness of 3D detection. First we apply the mutual-sided nearest-neighbor probability model to achieve the 3D-2D data association. Then we design an optimization pipeline that can optimize different kinds of instances separately based on the matching result. Apart from this, the 3D MOT method is also introduced to enhance the performance aided by previous frames. To the best of our knowledge, this is the first optimization-based late fusion framework for multi-modal 3D object detection which can be served as a baseline for subsequent research. Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1\% on mAP score compared with PointPillars and achieves competitive results with the learning-based late fusion CLOCs.

Via

Access Paper or Ask Questions

Failure-aware Policy Learning for Self-assessable Robotics Tasks

Feb 25, 2023

Kechun Xu, Runjian Chen, Shuqi Zhao, Zizhang Li, Hongxiang Yu, Ci Chen, Yue Wang, Rong Xiong

Figure 1 for Failure-aware Policy Learning for Self-assessable Robotics Tasks

Figure 2 for Failure-aware Policy Learning for Self-assessable Robotics Tasks

Figure 3 for Failure-aware Policy Learning for Self-assessable Robotics Tasks

Figure 4 for Failure-aware Policy Learning for Self-assessable Robotics Tasks

Abstract:Self-assessment rules play an essential role in safe and effective real-world robotic applications, which verify the feasibility of the selected action before actual execution. But how to utilize the self-assessment results to re-choose actions remains a challenge. Previous methods eliminate the selected action evaluated as failed by the self-assessment rules, and re-choose one with the next-highest affordance~(i.e. process-of-elimination strategy [1]), which ignores the dependency between the self-assessment results and the remaining untried actions. However, this dependency is important since the previous failures might help trim the remaining over-estimated actions. In this paper, we set to investigate this dependency by learning a failure-aware policy. We propose two architectures for the failure-aware policy by representing the self-assessment results of previous failures as the variable state, and leveraging recurrent neural networks to implicitly memorize the previous failures. Experiments conducted on three tasks demonstrate that our method can achieve better performances with higher task success rates by less trials. Moreover, when the actions are correlated, learning a failure-aware policy can achieve better performance than the process-of-elimination strategy.

Via

Access Paper or Ask Questions

A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

Feb 24, 2023

Kechun Xu, Shuqi Zhao, Zhongxiang Zhou, Zizhang Li, Huaijin Pi, Yifeng Zhu, Yue Wang, Rong Xiong

Figure 1 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

Figure 2 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

Figure 3 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

Figure 4 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

Abstract:We focus on the task of language-conditioned grasping in clutter, in which a robot is supposed to grasp the target object based on a language instruction. Previous works separately conduct visual grounding to localize the target object, and generate a grasp for that object. However, these works require object labels or visual attributes for grounding, which calls for handcrafted rules in planner and restricts the range of language instructions. In this paper, we propose to jointly model vision, language and action with object-centric representation. Our method is applicable under more flexible language instructions, and not limited by visual grounding error. Besides, by utilizing the powerful priors from the pre-trained multi-modal model and grasp model, sample efficiency is effectively improved and the sim2real problem is relived without additional data for transfer. A series of experiments carried out in simulation and real world indicate that our method can achieve better task success rate by less times of motion under more flexible language instructions. Moreover, our method is capable of generalizing better to scenarios with unseen objects and language instructions.

* Accepted by ICRA 2023

Via

Access Paper or Ask Questions

A Survey on Global LiDAR Localization

Feb 15, 2023

Huan Yin, Xuecheng Xu, Sha Lu, Xieyuanli Chen, Rong Xiong, Shaojie Shen, Cyrill Stachniss, Yue Wang

Figure 1 for A Survey on Global LiDAR Localization

Figure 2 for A Survey on Global LiDAR Localization

Figure 3 for A Survey on Global LiDAR Localization

Figure 4 for A Survey on Global LiDAR Localization

Abstract:Knowledge about the own pose is key for all mobile robot applications. Thus pose estimation is part of the core functionalities of mobile robots. In the last two decades, LiDAR scanners have become a standard sensor for robot localization and mapping. This article surveys recent progress and advances in LiDAR-based global localization. We start with the problem formulation and explore the application scope. We then present the methodology review covering various global localization topics, such as maps, descriptor extraction, and consistency checks. The contents are organized under three themes. The first is the combination of global place retrieval and local pose estimation. Then the second theme is upgrading single-shot measurement to sequential ones for sequential global localization. The third theme is extending single-robot global localization to cross-robot localization on multi-robot systems. We end this survey with a discussion of open challenges and promising directions on global lidar localization.

Via

Access Paper or Ask Questions