Alert button
Picture for Ruoxi Chen

Ruoxi Chen

Alert button

AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking

Mar 25, 2023
Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng

Figure 1 for AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking
Figure 2 for AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking
Figure 3 for AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking
Figure 4 for AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking

Deep neural networks (DNNs) are vulnerable to adversarial examples, which may lead to catastrophe in security-critical domains. Numerous detection methods are proposed to characterize the feature uniqueness of adversarial examples, or to distinguish DNN's behavior activated by the adversarial examples. Detections based on features cannot handle adversarial examples with large perturbations. Besides, they require a large amount of specific adversarial examples. Another mainstream, model-based detections, which characterize input properties by model behaviors, suffer from heavy computation cost. To address the issues, we introduce the concept of local gradient, and reveal that adversarial examples have a quite larger bound of local gradient than the benign ones. Inspired by the observation, we leverage local gradient for detecting adversarial examples, and propose a general framework AdvCheck. Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones. Through extensive experiments, we have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($\sim \times 1.2$) on general adversarial attacks and ($\sim \times 1.4$) on misclassified natural inputs on average, with average 1/500 time cost. We also provide interpretable results for successful detection.

* 26 pages 
Viaarxiv icon

Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection

Jun 17, 2022
Jinyin Chen, Chengyu Jia, Haibin Zheng, Ruoxi Chen, Chenbo Fu

Figure 1 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 2 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 3 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection
Figure 4 for Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection

The proliferation of fake news and its serious negative social influence push fake news detection methods to become necessary tools for web managers. Meanwhile, the multi-media nature of social media makes multi-modal fake news detection popular for its ability to capture more modal features than uni-modal detection methods. However, current literature on multi-modal detection is more likely to pursue the detection accuracy but ignore the robustness of the detector. To address this problem, we propose a comprehensive robustness evaluation of multi-modal fake news detectors. In this work, we simulate the attack methods of malicious users and developers, i.e., posting fake news and injecting backdoors. Specifically, we evaluate multi-modal detectors with five adversarial and two backdoor attack methods. Experiment results imply that: (1) The detection performance of the state-of-the-art detectors degrades significantly under adversarial attacks, even worse than general detectors; (2) Most multi-modal detectors are more vulnerable when subjected to attacks on visual modality than textual modality; (3) Popular events' images will cause significant degradation to the detectors when they are subjected to backdoor attacks; (4) The performance of these detectors under multi-modal attacks is worse than under uni-modal attacks; (5) Defensive methods will improve the robustness of the multi-modal detectors.

Viaarxiv icon

Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization

May 01, 2022
Qinying Liu, Zilei Wang, Ruoxi Chen, Zhilin Li

Figure 1 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 2 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 3 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Figure 4 for Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization

In weakly-supervised temporal action localization (WS-TAL), the methods commonly follow the "localization by classification" procedure, which uses the snippet predictions to form video class scores and then optimizes a video classification loss. In this procedure, the snippet predictions (or snippet attention weights) are used to separate foreground and background. However, the snippet predictions are usually inaccurate due to absence of frame-wise labels, and then the overall performance is hindered. In this paper, we propose a novel C$^3$BN to achieve robust snippet predictions. C$^3$BN includes two key designs by exploring the inherent characteristics of video data. First, because of the natural continuity of adjacent snippets, we propose a micro data augmentation strategy to increase the diversity of snippets with convex combination of adjacent snippets. Second, we propose a macro-micro consistency regularization strategy to force the model to be invariant (or equivariant) to the transformations of snippets with respect to video semantics, snippet predictions and snippet features. Experimental results demonstrate the effectiveness of our proposed method on top of baselines for the WS-TAL tasks with video-level and point-level supervision.

Viaarxiv icon

DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity

Feb 12, 2022
Haibo Jin, Ruoxi Chen, Haibin Zheng, Jinyin Chen, Zhenguang Liu, Qi Xuan, Yue Yu, Yao Cheng

Figure 1 for DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity
Figure 2 for DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity
Figure 3 for DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity
Figure 4 for DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity

Despite impressive capabilities and outstanding performance, deep neural network(DNN) has captured increasing public concern for its security problem, due to frequent occurrence of erroneous behaviors. Therefore, it is necessary to conduct systematically testing before its deployment to real-world applications. Existing testing methods have provided fine-grained criteria based on neuron coverage and reached high exploratory degree of testing. But there is still a gap between the neuron coverage and model's robustness evaluation. To bridge the gap, we observed that neurons which change the activation value dramatically due to minor perturbation are prone to trigger incorrect corner cases. Motivated by it, we propose neuron sensitivity and develop a novel white-box testing framework for DNN, donated as DeepSensor. The number of sensitive neurons is maximized by particle swarm optimization, thus diverse corner cases could be triggered and neuron coverage be further improved when compared with baselines. Besides, considerable robustness enhancement can be reached when adopting testing examples based on neuron sensitivity for retraining. Extensive experiments implemented on scalable datasets and models can well demonstrate the testing effectiveness and robustness improvement of DeepSensor.

* 8 pages 
Viaarxiv icon

CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing

Dec 24, 2021
Haibo Jin, Ruoxi Chen, Jinyin Chen, Yao Cheng, Chong Fu, Ting Wang, Yue Yu, Zhaoyan Ming

Figure 1 for CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing
Figure 2 for CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing
Figure 3 for CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing
Figure 4 for CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing

The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Existing DNN testing methods are mainly designed to find incorrect corner case behaviors in adversarial settings but fail to discover the backdoors crafted by strong trojan attacks. Observing the trojan network behaviors shows that they are not just reflected by a single compromised neuron as proposed by previous work but attributed to the critical neural paths in the activation intensity and frequency of multiple neurons. This work formulates the DNN backdoor testing and proposes the CatchBackdoor framework. Via differential fuzzing of critical neurons from a small number of benign examples, we identify the trojan paths and particularly the critical ones, and generate backdoor testing examples by simulating the critical neurons in the identified paths. Extensive experiments demonstrate the superiority of CatchBackdoor, with higher detection performance than existing methods. CatchBackdoor works better on detecting backdoors by stealthy blending and adaptive attacks, which existing methods fail to detect. Moreover, our experiments show that CatchBackdoor may reveal the potential backdoors of models in Model Zoo.

* 13 pages 
Viaarxiv icon

NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks

Dec 24, 2021
Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng, Yue Yu, Shouling Ji

Figure 1 for NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks
Figure 2 for NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks
Figure 3 for NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks
Figure 4 for NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks

Although deep learning models have achieved unprecedented success, their vulnerabilities towards adversarial attacks have attracted increasing attention, especially when deployed in security-critical domains. To address the challenge, numerous defense strategies, including reactive and proactive ones, have been proposed for robustness improvement. From the perspective of image feature space, some of them cannot reach satisfying results due to the shift of features. Besides, features learned by models are not directly related to classification results. Different from them, We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks. We observed that attacks mislead the model by dramatically changing the neurons that contribute most and least to the correct label. Motivated by it, we introduce the concept of neuron influence and further divide neurons into front, middle and tail part. Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks. By strengthening front neurons and weakening those in the tail part, NIP can eliminate nearly all adversarial perturbations while still maintaining high benign accuracy. Besides, it can cope with different sizes of perturbations via adaptivity, especially larger ones. Comprehensive experiments conducted on three datasets and six models show that NIP outperforms the state-of-the-art baselines against eleven adversarial attacks. We further provide interpretable proofs via neuron activation and visualization for better understanding.

* 14 pages 
Viaarxiv icon

Salient Feature Extractor for Adversarial Defense on Deep Neural Networks

May 14, 2021
Jinyin Chen, Ruoxi Chen, Haibin Zheng, Zhaoyan Ming, Wenrong Jiang, Chen Cui

Figure 1 for Salient Feature Extractor for Adversarial Defense on Deep Neural Networks
Figure 2 for Salient Feature Extractor for Adversarial Defense on Deep Neural Networks
Figure 3 for Salient Feature Extractor for Adversarial Defense on Deep Neural Networks
Figure 4 for Salient Feature Extractor for Adversarial Defense on Deep Neural Networks

Recent years have witnessed unprecedented success achieved by deep learning models in the field of computer vision. However, their vulnerability towards carefully crafted adversarial examples has also attracted the increasing attention of researchers. Motivated by the observation that adversarial examples are due to the non-robust feature learned from the original dataset by models, we propose the concepts of salient feature(SF) and trivial feature(TF). The former represents the class-related feature, while the latter is usually adopted to mislead the model. We extract these two features with coupled generative adversarial network model and put forward a novel detection and defense method named salient feature extractor (SFE) to defend against adversarial attacks. Concretely, detection is realized by separating and comparing the difference between SF and TF of the input. At the same time, correct labels are obtained by re-identifying SF to reach the purpose of defense. Extensive experiments are carried out on MNIST, CIFAR-10, and ImageNet datasets where SFE shows state-of-the-art results in effectiveness and efficiency compared with baselines. Furthermore, we provide an interpretable understanding of the defense and detection process.

Viaarxiv icon