Alert button
Picture for Shengjie Wang

Shengjie Wang

Alert button

i-Octree: A Fast, Lightweight, and Dynamic Octree for Proximity Search

Sep 15, 2023
Jun Zhu, Hongyi Li, Shengjie Wang, Zhepeng Wang, Tao Zhang

Establishing the correspondences between newly acquired points and historically accumulated data (i.e., map) through nearest neighbors search is crucial in numerous robotic applications.However, static tree data structures are inadequate to handle large and dynamically growing maps in real-time.To address this issue, we present the i-Octree, a dynamic octree data structure that supports both fast nearest neighbor search and real-time dynamic updates, such as point insertion, deletion, and on-tree down-sampling. The i-Octree is built upon a leaf-based octree and has two key features: a local spatially continuous storing strategy that allows for fast access to points while minimizing memory usage, and local on-tree updates that significantly reduce computation time compared to existing static or dynamic tree structures.The experiments show that i-Octree surpasses state-of-the-art methods by reducing run-time by over 50% on real-world open datasets.

* 7 pages, 7 figures 
Viaarxiv icon

Machine Learning Force Fields with Data Cost Aware Training

Jun 05, 2023
Alexander Bukharin, Tianyi Liu, Shengjie Wang, Simiao Zuo, Weihao Gao, Wen Yan, Tuo Zhao

Figure 1 for Machine Learning Force Fields with Data Cost Aware Training
Figure 2 for Machine Learning Force Fields with Data Cost Aware Training
Figure 3 for Machine Learning Force Fields with Data Cost Aware Training
Figure 4 for Machine Learning Force Fields with Data Cost Aware Training

Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$, with $n$ proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data. The motivation behind ASTEROID is that inaccurate data, though incurring large bias, can help capture the sophisticated structures of the underlying force field. Therefore, we first train a MLFF model on a large amount of inaccurate training data, employing a bias-aware loss function to prevent the model from overfitting tahe potential bias of this data. We then fine-tune the obtained model using a small amount of accurate training data, which preserves the knowledge learned from the inaccurate training data while significantly improving the model's accuracy. Moreover, we propose a variant of ASTEROID based on score matching for the setting where the inaccurate training data are unlabeled. Extensive experiments on MD datasets and downstream tasks validate the efficacy of ASTEROID. Our code and data are available at https://github.com/abukharin3/asteroid.

Viaarxiv icon

IMAP: Intrinsically Motivated Adversarial Policy

May 04, 2023
Xiang Zheng, Xingjun Ma, Shengjie Wang, Xinyu Wang, Chao Shen, Cong Wang

Figure 1 for IMAP: Intrinsically Motivated Adversarial Policy
Figure 2 for IMAP: Intrinsically Motivated Adversarial Policy
Figure 3 for IMAP: Intrinsically Motivated Adversarial Policy
Figure 4 for IMAP: Intrinsically Motivated Adversarial Policy

Reinforcement learning (RL) agents are known to be vulnerable to evasion attacks during deployment. In single-agent environments, attackers can inject imperceptible perturbations on the policy or value network's inputs or outputs; in multi-agent environments, attackers can control an adversarial opponent to indirectly influence the victim's observation. Adversarial policies offer a promising solution to craft such attacks. Still, current approaches either require perfect or partial knowledge of the victim policy or suffer from sample inefficiency due to the sparsity of task-related rewards. To overcome these limitations, we propose the Intrinsically Motivated Adversarial Policy (IMAP) for efficient black-box evasion attacks in single- and multi-agent environments without any knowledge of the victim policy. IMAP uses four intrinsic objectives based on state coverage, policy coverage, risk, and policy divergence to encourage exploration and discover stronger attacking skills. We also design a novel Bias-Reduction (BR) method to boost IMAP further. Our experiments demonstrate the effectiveness of these intrinsic objectives and BR in improving adversarial policy learning in the black-box setting against multiple types of victim agents in various single- and multi-agent MuJoCo environments. Notably, our IMAP reduces the performance of the state-of-the-art robust WocaR-PPO agents by 34\%-54\% and achieves a SOTA attacking success rate of 83.91\% in the two-player zero-sum game YouShallNotPass.

Viaarxiv icon

A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Mar 27, 2023
Yuxue Cao, Shengjie Wang, Xiang Zheng, Wenke Ma, Tao Zhang

Figure 1 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation
Figure 2 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation
Figure 3 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation
Figure 4 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Symmetric bi-manual manipulation is essential for various on-orbit operations due to its potent load capacity. As a result, there exists an emerging research interest in the problem of achieving high operation accuracy while enhancing adaptability and compliance. However, previous works relied on an inefficient algorithm framework that separates motion planning from compliant control. Additionally, the compliant controller lacks robustness due to manually adjusted parameters. This paper proposes a novel Learning-based Adaptive Compliance algorithm (LAC) that improves the efficiency and robustness of symmetric bi-manual manipulation. Specifically, first, the algorithm framework combines desired trajectory generation with impedance-parameter adjustment to improve efficiency and robustness. Second, we introduce a centralized Actor-Critic framework with LSTM networks, enhancing the synchronization of bi-manual manipulation. LSTM networks pre-process the force states obtained by the agents, further ameliorating the performance of compliance operations. When evaluated in the dual-arm cooperative handling and peg-in-hole assembly experiments, our method outperforms baseline algorithms in terms of optimality and robustness.

* 12 pages, 10 figures 
Viaarxiv icon

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Feb 28, 2023
Haotian Xu, Shengjie Wang, Zhaolei Wang, Qing Zhuo, Tao Zhang

Figure 1 for Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization
Figure 2 for Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization
Figure 3 for Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization
Figure 4 for Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose a Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) algorithm to strike a balance between the exploration and the constraints. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety budget) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with CPO algorithm.

* 7 pages, 8 figures 
Viaarxiv icon

A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

Jan 02, 2023
Shengjie Wang, Fengbo Lan, Xiang Zheng, Yuxue Cao, Oluwatosin Oseni, Haotian Xu, Yang Gao, Tao Zhang

Figure 1 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification
Figure 2 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification
Figure 3 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification
Figure 4 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.

* 27 pages, 11 figues 
Viaarxiv icon

DGRec: Graph Neural Network for Recommendation with Diversified Embedding Generation

Nov 27, 2022
Liangwei Yang, Shengjie Wang, Yunzhe Tao, Jiankai Sun, Xiaolong Liu, Philip S. Yu, Taiqing Wang

Figure 1 for DGRec: Graph Neural Network for Recommendation with Diversified Embedding Generation
Figure 2 for DGRec: Graph Neural Network for Recommendation with Diversified Embedding Generation
Figure 3 for DGRec: Graph Neural Network for Recommendation with Diversified Embedding Generation
Figure 4 for DGRec: Graph Neural Network for Recommendation with Diversified Embedding Generation

Graph Neural Network (GNN) based recommender systems have been attracting more and more attention in recent years due to their excellent performance in accuracy. Representing user-item interactions as a bipartite graph, a GNN model generates user and item representations by aggregating embeddings of their neighbors. However, such an aggregation procedure often accumulates information purely based on the graph structure, overlooking the redundancy of the aggregated neighbors and resulting in poor diversity of the recommended list. In this paper, we propose diversifying GNN-based recommender systems by directly improving the embedding generation procedure. Particularly, we utilize the following three modules: submodular neighbor selection to find a subset of diverse neighbors to aggregate for each GNN node, layer attention to assign attention weights for each layer, and loss reweighting to focus on the learning of items belonging to long-tail categories. Blending the three modules into GNN, we present DGRec(Diversified GNN-based Recommender System) for diversified recommendation. Experiments on real-world datasets demonstrate that the proposed method can achieve the best diversity while keeping the accuracy comparable to state-of-the-art GNN-based recommender systems.

* 9 pages, WSDM 2023 
Viaarxiv icon

Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot

Sep 03, 2022
Yuxue Cao, Shengjie Wang, Xiang Zheng, Wenke Ma, Xinru Xie, Lei Liu

Figure 1 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot
Figure 2 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot
Figure 3 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot
Figure 4 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot

Reinforcement learning methods as a promising technique have achieved superior results in the motion planning of free-floating space robots. However, due to the increase in planning dimension and the intensification of system dynamics coupling, the motion planning of dual-arm free-floating space robots remains an open challenge. In particular, the current study cannot handle the task of capturing a non-cooperative object due to the lack of the pose constraint of the end-effectors. To address the problem, we propose a novel algorithm, EfficientLPT, to facilitate RL-based methods to improve planning accuracy efficiently. Our core contributions are constructing a mixed policy with prior knowledge guidance and introducing infinite norm to build a more reasonable reward function. Furthermore, our method successfully captures a rotating object with different spinning speeds.

* 26 pages, 13 figures 
Viaarxiv icon

A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Jul 06, 2022
Shengjie Wang, Yuxue Cao, Xiang Zheng, Tao Zhang

Figure 1 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object
Figure 2 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object
Figure 3 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object
Figure 4 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Recent years have seen the emergence of non-cooperative objects in space, like failed satellites and space junk. These objects are usually operated or collected by free-float dual-arm space manipulators. Thanks to eliminating the difficulties of modeling and manual parameter-tuning, reinforcement learning (RL) methods have shown a more promising sign in the trajectory planning of space manipulators. Although previous studies demonstrate their effectiveness, they cannot be applied in tracking dynamic targets with unknown rotation (non-cooperative objects). In this paper, we proposed a learning system for motion planning of free-float dual-arm space manipulator (FFDASM) towards non-cooperative objects. Specifically, our method consists of two modules. Module I realizes the multi-target trajectory planning for two end-effectors within a large target space. Next, Module II takes as input the point clouds of the non-cooperative object to estimate the motional property, and then can predict the position of target points on an non-cooperative object. We leveraged the combination of Module I and Module II to track target points on a spinning object with unknown regularity successfully. Furthermore, the experiments also demonstrate the scalability and generalization of our learning system.

* 15 pages, 6 figures 
Viaarxiv icon