Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hang Su

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Jun 09, 2022
Chengyang Ying, Xinning Zhou, Hang Su, Dong Yan, Ning Chen, Jun Zhu

Figure 1 for Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Figure 2 for Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Figure 3 for Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Figure 4 for Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo.

Via

Access Paper or Ask Questions

GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Jun 09, 2022
Zhongkai Hao, Chengyang Ying, Yinpeng Dong, Hang Su, Jun Zhu, Jian Song

Figure 1 for GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Figure 2 for GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Figure 3 for GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Figure 4 for GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing

Certified defenses such as randomized smoothing have shown promise towards building reliable machine learning systems against $\ell_p$-norm bounded attacks. However, existing methods are insufficient or unable to provably defend against semantic transformations, especially those without closed-form expressions (such as defocus blur and pixelate), which are more common in practice and often unrestricted. To fill up this gap, we propose generalized randomized smoothing (GSmooth), a unified theoretical framework for certifying robustness against general semantic transformations via a novel dimension augmentation strategy. Under the GSmooth framework, we present a scalable algorithm that uses a surrogate image-to-image network to approximate the complex transformation. The surrogate model provides a powerful tool for studying the properties of semantic transformations and certifying robustness. Experimental results on several datasets demonstrate the effectiveness of our approach for robustness certification against multiple kinds of semantic transformations and corruptions, which is not achievable by the alternative baselines.

Via

Access Paper or Ask Questions

Human-Robot Shared Control for Surgical Robot Based on Context-Aware Sim-to-Real Adaptation

Apr 23, 2022
Dandan Zhang, Zicong Wu, Junhong Chen, Ruiqi Zhu, Adnan Munawar, Bo Xiao, Yuan Guan, Hang Su, Wuzhou Hong, Yao Guo, Gregory S. Fischer, Benny Lo, Guang-Zhong Yang

Figure 1 for Human-Robot Shared Control for Surgical Robot Based on Context-Aware Sim-to-Real Adaptation

Figure 2 for Human-Robot Shared Control for Surgical Robot Based on Context-Aware Sim-to-Real Adaptation

Figure 3 for Human-Robot Shared Control for Surgical Robot Based on Context-Aware Sim-to-Real Adaptation

Figure 4 for Human-Robot Shared Control for Surgical Robot Based on Context-Aware Sim-to-Real Adaptation

Human-robot shared control, which integrates the advantages of both humans and robots, is an effective approach to facilitate efficient surgical operation. Learning from demonstration (LfD) techniques can be used to automate some of the surgical subtasks for the construction of the shared control mechanism. However, a sufficient amount of data is required for the robot to learn the manoeuvres. Using a surgical simulator to collect data is a less resource-demanding approach. With sim-to-real adaptation, the manoeuvres learned from a simulator can be transferred to a physical robot. To this end, we propose a sim-to-real adaptation method to construct a human-robot shared control framework for robotic surgery. In this paper, a desired trajectory is generated from a simulator using LfD method, while dynamic motion primitives (DMP) is used to transfer the desired trajectory from the simulator to the physical robotic platform. Moreover, a role adaptation mechanism is developed such that the robot can adjust its role according to the surgical operation contexts predicted by a neural network model. The effectiveness of the proposed framework is validated on the da Vinci Research Kit (dVRK). Results of the user studies indicated that with the adaptive human-robot shared control framework, the path length of the remote controller, the total clutching number and the task completion time can be reduced significantly. The proposed method outperformed the traditional manual control via teleoperation.

* Accepted by ICRA

Via

Access Paper or Ask Questions

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Apr 07, 2022
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

Figure 1 for DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Figure 2 for DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Figure 3 for DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Figure 4 for DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves $48.3$AP in $12$ epochs and $51.0$AP in $36$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of $\textbf{+4.9}$\textbf{AP} and $\textbf{+2.4}$\textbf{AP}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO \texttt{val2017} ($\textbf{63.2}$\textbf{AP}) and \texttt{test-dev} (\textbf{$\textbf{63.3}$AP}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at \url{https://github.com/IDEACVR/DINO}.

Via

Access Paper or Ask Questions

A Roadmap for Big Model

Apr 02, 2022
Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, Huawei Shen, Hui Zhang, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan Yao, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, Liwei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

* arXiv admin note: text overlap with arXiv:2107.06499 by other authors

Via

Access Paper or Ask Questions

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

Mar 15, 2022
Jialian Li, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu

Figure 1 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

Figure 2 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

Figure 3 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

Figure 4 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent mismatches from the simulator and real environment. It is therefore imperative to utilize the simulator to learn a robust policy for the real-world deployment. In this work, we consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Specifically, we focus on the setting where the training environment can be characterized as a generative model and a constrained perturbation can be added to the model during testing. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties as we need to simultaneously estimate the training environment uncertainty from samples and find the worst-case perturbation for testing. To solve this issue, we propose a generic method which formalizes the perturbation as an opponent to obtain a two-player zero-sum game, and further show that the Nash Equilibrium corresponds to the robust policy. We prove that, with a polynomial number of samples from the generative model, our algorithm can find a near-optimal robust policy with a high probability. Our method is able to deal with general perturbations under some mild assumptions and can also be extended to more complex problems like robust partial observable Markov decision process, thanks to the game-theoretical formulation.

* AAAI 2022

Via

Access Paper or Ask Questions

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Mode

Mar 13, 2022
Jialian Li, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu

Figure 1 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Mode

Figure 2 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Mode

Figure 3 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Mode

Figure 4 for Policy Learning for Robust Markov Decision Process with a Mismatched Generative Mode

* AAAI 2022

Via

Access Paper or Ask Questions

Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior

Mar 13, 2022
Yinpeng Dong, Shuyu Cheng, Tianyu Pang, Hang Su, Jun Zhu

Figure 1 for Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior

Figure 2 for Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior

Figure 3 for Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior

Figure 4 for Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior

Adversarial attacks have been extensively studied in recent years since they can identify the vulnerability of deep learning models before deployed. In this paper, we consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model. Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries. However, the existing methods inevitably suffer from low attack success rates or poor query efficiency since it is difficult to estimate the gradient in a high-dimensional input space with limited information. To address these problems and improve black-box attacks, we propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging, respectively. Our methods can take the advantage of a transfer-based prior given by the gradient of a surrogate model and the query information simultaneously. Through theoretical analyses, the transfer-based prior is appropriately integrated with model queries by an optimal coefficient in each method. Extensive experiments demonstrate that, in comparison with the alternative state-of-the-arts, both of our methods require much fewer queries to attack black-box models with higher success rates.

* Accepted by IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI). The official version is at https://ieeexplore.ieee.org/document/9609659

Via

Access Paper or Ask Questions

Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

Mar 10, 2022
Xiao Yang, Yinpeng Dong, Tianyu Pang, Zihao Xiao, Hang Su, Jun Zhu

Figure 1 for Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

Figure 2 for Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

Figure 3 for Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

Figure 4 for Controllable Evaluation and Generation of Physical Adversarial Patch on Face Recognition

Recent studies have revealed the vulnerability of face recognition models against physical adversarial patches, which raises security concerns about the deployed face recognition systems. However, it is still challenging to ensure the reproducibility for most attack algorithms under complex physical conditions, which leads to the lack of a systematic evaluation of the existing methods. It is therefore imperative to develop a framework that can enable a comprehensive evaluation of the vulnerability of face recognition in the physical world. To this end, we propose to simulate the complex transformations of faces in the physical world via 3D-face modeling, which serves as a digital counterpart of physical faces. The generic framework allows us to control different face variations and physical conditions to conduct reproducible evaluations comprehensively. With this digital simulator, we further propose a Face3DAdv method considering the 3D face transformations and realistic physical variations. Extensive experiments validate that Face3DAdv can significantly improve the effectiveness of diverse physically realizable adversarial patches in both simulated and physical environments, against various white-box and black-box face recognition models.

Via

Access Paper or Ask Questions