Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiang Zheng

IMAP: Intrinsically Motivated Adversarial Policy

May 04, 2023
Xiang Zheng, Xingjun Ma, Shengjie Wang, Xinyu Wang, Chao Shen, Cong Wang

Figure 1 for IMAP: Intrinsically Motivated Adversarial Policy

Figure 2 for IMAP: Intrinsically Motivated Adversarial Policy

Figure 3 for IMAP: Intrinsically Motivated Adversarial Policy

Figure 4 for IMAP: Intrinsically Motivated Adversarial Policy

Reinforcement learning (RL) agents are known to be vulnerable to evasion attacks during deployment. In single-agent environments, attackers can inject imperceptible perturbations on the policy or value network's inputs or outputs; in multi-agent environments, attackers can control an adversarial opponent to indirectly influence the victim's observation. Adversarial policies offer a promising solution to craft such attacks. Still, current approaches either require perfect or partial knowledge of the victim policy or suffer from sample inefficiency due to the sparsity of task-related rewards. To overcome these limitations, we propose the Intrinsically Motivated Adversarial Policy (IMAP) for efficient black-box evasion attacks in single- and multi-agent environments without any knowledge of the victim policy. IMAP uses four intrinsic objectives based on state coverage, policy coverage, risk, and policy divergence to encourage exploration and discover stronger attacking skills. We also design a novel Bias-Reduction (BR) method to boost IMAP further. Our experiments demonstrate the effectiveness of these intrinsic objectives and BR in improving adversarial policy learning in the black-box setting against multiple types of victim agents in various single- and multi-agent MuJoCo environments. Notably, our IMAP reduces the performance of the state-of-the-art robust WocaR-PPO agents by 34\%-54\% and achieves a SOTA attacking success rate of 83.91\% in the two-player zero-sum game YouShallNotPass.

Via

Access Paper or Ask Questions

A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Mar 27, 2023
Yuxue Cao, Shengjie Wang, Xiang Zheng, Wenke Ma, Tao Zhang

Figure 1 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Figure 2 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Figure 3 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Figure 4 for A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Symmetric bi-manual manipulation is essential for various on-orbit operations due to its potent load capacity. As a result, there exists an emerging research interest in the problem of achieving high operation accuracy while enhancing adaptability and compliance. However, previous works relied on an inefficient algorithm framework that separates motion planning from compliant control. Additionally, the compliant controller lacks robustness due to manually adjusted parameters. This paper proposes a novel Learning-based Adaptive Compliance algorithm (LAC) that improves the efficiency and robustness of symmetric bi-manual manipulation. Specifically, first, the algorithm framework combines desired trajectory generation with impedance-parameter adjustment to improve efficiency and robustness. Second, we introduce a centralized Actor-Critic framework with LSTM networks, enhancing the synchronization of bi-manual manipulation. LSTM networks pre-process the force states obtained by the agents, further ameliorating the performance of compliance operations. When evaluated in the dual-arm cooperative handling and peg-in-hole assembly experiments, our method outperforms baseline algorithms in terms of optimality and robustness.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

Jan 02, 2023
Shengjie Wang, Fengbo Lan, Xiang Zheng, Yuxue Cao, Oluwatosin Oseni, Haotian Xu, Yang Gao, Tao Zhang

Figure 1 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

Figure 2 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

Figure 3 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

Figure 4 for A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.

* 27 pages, 11 figues

Via

Access Paper or Ask Questions

CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

Nov 28, 2022
Xiang Zheng, Xingjun Ma, Cong Wang

Figure 1 for CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

Figure 2 for CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

Figure 3 for CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

Figure 4 for CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

Intrinsic motivation is a promising exploration technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards. There exist two technical challenges in implementing intrinsic motivation: 1) how to design a proper intrinsic objective to facilitate efficient exploration; and 2) how to combine the intrinsic objective with the extrinsic objective to help find better solutions. In the current literature, the intrinsic objectives are all designed in a task-agnostic manner and combined with the extrinsic objective via simple addition (or used by itself for reward-free pre-training). In this work, we show that these designs would fail in typical sparse-reward continuous control tasks. To address the problem, we propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective, and at the same time, exploit the Lagrangian method to adaptively balance the intrinsic and extrinsic objectives via a simultaneous-maximization framework. We empirically show, on multiple sparse-reward continuous control tasks, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods. Moreover, the key techniques of our CIM can also be plugged into existing methods to boost their performances.

Via

Access Paper or Ask Questions

Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot

Sep 03, 2022
Yuxue Cao, Shengjie Wang, Xiang Zheng, Wenke Ma, Xinru Xie, Lei Liu

Figure 1 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot

Figure 2 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot

Figure 3 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot

Figure 4 for Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space Robot

Reinforcement learning methods as a promising technique have achieved superior results in the motion planning of free-floating space robots. However, due to the increase in planning dimension and the intensification of system dynamics coupling, the motion planning of dual-arm free-floating space robots remains an open challenge. In particular, the current study cannot handle the task of capturing a non-cooperative object due to the lack of the pose constraint of the end-effectors. To address the problem, we propose a novel algorithm, EfficientLPT, to facilitate RL-based methods to improve planning accuracy efficiently. Our core contributions are constructing a mixed policy with prior knowledge guidance and introducing infinite norm to build a more reasonable reward function. Furthermore, our method successfully captures a rotating object with different spinning speeds.

* 26 pages, 13 figures

Via

Access Paper or Ask Questions

A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Jul 06, 2022
Shengjie Wang, Yuxue Cao, Xiang Zheng, Tao Zhang

Figure 1 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Figure 2 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Figure 3 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Figure 4 for A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Recent years have seen the emergence of non-cooperative objects in space, like failed satellites and space junk. These objects are usually operated or collected by free-float dual-arm space manipulators. Thanks to eliminating the difficulties of modeling and manual parameter-tuning, reinforcement learning (RL) methods have shown a more promising sign in the trajectory planning of space manipulators. Although previous studies demonstrate their effectiveness, they cannot be applied in tracking dynamic targets with unknown rotation (non-cooperative objects). In this paper, we proposed a learning system for motion planning of free-float dual-arm space manipulator (FFDASM) towards non-cooperative objects. Specifically, our method consists of two modules. Module I realizes the multi-target trajectory planning for two end-effectors within a large target space. Next, Module II takes as input the point clouds of the non-cooperative object to estimate the motional property, and then can predict the position of target points on an non-cooperative object. We leveraged the combination of Module I and Module II to track target points on a spinning object with unknown regularity successfully. Furthermore, the experiments also demonstrate the scalability and generalization of our learning system.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Clean-Label Backdoor Attacks on Video Recognition Models

Mar 06, 2020
Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang

Figure 1 for Clean-Label Backdoor Attacks on Video Recognition Models

Figure 2 for Clean-Label Backdoor Attacks on Video Recognition Models

Figure 3 for Clean-Label Backdoor Attacks on Video Recognition Models

Figure 4 for Clean-Label Backdoor Attacks on Video Recognition Models

Deep neural networks (DNNs) are vulnerable to backdoor attacks which can hide backdoor triggers in DNNs by poisoning training data. A backdoored model behaves normally on clean test images, yet consistently predicts a particular target class for any test examples that contain the trigger pattern. As such, backdoor attacks are hard to detect, and have raised severe security concerns in real-world applications. Thus far, backdoor research has mostly been conducted in the image domain with image classification models. In this paper, we show that existing image backdoor attacks are far less effective on videos, and outline 4 strict conditions where existing attacks are likely to fail: 1) scenarios with more input dimensions (eg. videos), 2) scenarios with high resolution, 3) scenarios with a large number of classes and few examples per class (a "sparse dataset"), and 4) attacks with access to correct labels (eg. clean-label attacks). We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions. We show on benchmark video datasets that our proposed backdoor attack can manipulate state-of-the-art video models with high success rates by poisoning only a small proportion of training data (without changing the labels). We also show that our proposed backdoor attack is resistant to state-of-the-art backdoor defense/detection methods, and can even be applied to improve image backdoor attacks. Our proposed video backdoor attack not only serves as a strong baseline for improving the robustness of video models, but also provides a new perspective for more understanding more powerful backdoor attacks.

Via

Access Paper or Ask Questions