Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Oct 15, 2021

Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun

Figure 1 for RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Figure 2 for RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Figure 3 for RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Figure 4 for RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Share this with someone who'll enjoy it:

Abstract:Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/lancopku/RAP.

* EMNLP 2021 (main conference), long paper, camera-ready version

View paper on

Share this with someone who'll enjoy it:

Title:RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Paper and Code