Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Backes

Memorization in Self-Supervised Learning Improves Downstream Generalization

Jan 24, 2024

Wenhao Wang, Muhammad Ahmad Kaleem, Adam Dziedzic, Michael Backes, Nicolas Papernot, Franziska Boenisch

Figure 1 for Memorization in Self-Supervised Learning Improves Downstream Generalization

Figure 2 for Memorization in Self-Supervised Learning Improves Downstream Generalization

Figure 3 for Memorization in Self-Supervised Learning Improves Downstream Generalization

Figure 4 for Memorization in Self-Supervised Learning Improves Downstream Generalization

Abstract:Self-supervised learning (SSL) has recently received significant attention due to its ability to train high-performance encoders purely on unlabeled data-often scraped from the internet. This data can still be sensitive and empirical evidence suggests that SSL encoders memorize private information of their training data and can disclose them at inference time. Since existing theoretical definitions of memorization from supervised learning rely on labels, they do not transfer to SSL. To address this gap, we propose SSLMem, a framework for defining memorization within SSL. Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not. Through comprehensive empirical analysis on diverse encoder architectures and datasets we highlight that even though SSL relies on large datasets and strong augmentations-both known in supervised learning as regularization techniques that reduce overfitting-still significant fractions of training data points experience high memorization. Through our empirical results, we show that this memorization is essential for encoders to achieve higher generalization performance on different downstream tasks.

* Accepted at ICLR 2024

Via

Access Paper or Ask Questions

FAKEPCD: Fake Point Cloud Detection via Source Attribution

Dec 18, 2023

Yiting Qu, Zhikun Zhang, Yun Shen, Michael Backes, Yang Zhang

Abstract:To prevent the mischievous use of synthetic (fake) point clouds produced by generative models, we pioneer the study of detecting point cloud authenticity and attributing them to their sources. We propose an attribution framework, FAKEPCD, to attribute (fake) point clouds to their respective generative models (or real-world collections). The main idea of FAKEPCD is to train an attribution model that learns the point cloud features from different sources and further differentiates these sources using an attribution signal. Depending on the characteristics of the training point clouds, namely, sources and shapes, we formulate four attribution scenarios: close-world, open-world, single-shape, and multiple-shape, and evaluate FAKEPCD's performance in each scenario. Extensive experimental results demonstrate the effectiveness of FAKEPCD on source attribution across different scenarios. Take the open-world attribution as an example, FAKEPCD attributes point clouds to known sources with an accuracy of 0.82-0.98 and to unknown sources with an accuracy of 0.73-1.00. Additionally, we introduce an approach to visualize unique patterns (fingerprints) in point clouds associated with each source. This explains how FAKEPCD recognizes point clouds from various sources by focusing on distinct areas within them. Overall, we hope our study establishes a baseline for the source attribution of (fake) point clouds.

* To Appear in the 19th ACM ASIA Conference on Computer and Communications Security, July 1-5, 2024

Via

Access Paper or Ask Questions

Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models

Oct 30, 2023

Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang

Abstract:Generative models have demonstrated revolutionary success in various visual creation tasks, but in the meantime, they have been exposed to the threat of leaking private information of their training data. Several membership inference attacks (MIAs) have been proposed to exhibit the privacy vulnerability of generative models by classifying a query image as a training dataset member or nonmember. However, these attacks suffer from major limitations, such as requiring shadow models and white-box access, and either ignoring or only focusing on the unique property of diffusion models, which block their generalization to multiple generative models. In contrast, we propose the first generalized membership inference attack against a variety of generative models such as generative adversarial networks, [variational] autoencoders, implicit functions, and the emerging diffusion models. We leverage only generated distributions from target generators and auxiliary non-member datasets, therefore regarding target generators as black boxes and agnostic to their architectures or application scenarios. Experiments validate that all the generative models are vulnerable to our attack. For instance, our work achieves attack AUC $>0.99$ against DDPM, DDIM, and FastDPM trained on CIFAR-10 and CelebA. And the attack against VQGAN, LDM (for the text-conditional generation), and LIIF achieves AUC $>0.90.$ As a result, we appeal to our community to be aware of such privacy leakage risks when designing and publishing generative models.

Via

Access Paper or Ask Questions

SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

Oct 19, 2023

Boyang Zhang, Zheng Li, Ziqing Yang, Xinlei He, Michael Backes, Mario Fritz, Yang Zhang

Abstract:While advanced machine learning (ML) models are deployed in numerous real-world applications, previous works demonstrate these models have security and privacy vulnerabilities. Various empirical research has been done in this field. However, most of the experiments are performed on target ML models trained by the security researchers themselves. Due to the high computational resource requirement for training advanced models with complex architectures, researchers generally choose to train a few target models using relatively simple architectures on typical experiment datasets. We argue that to understand ML models' vulnerabilities comprehensively, experiments should be performed on a large set of models trained with various purposes (not just the purpose of evaluating ML attacks and defenses). To this end, we propose using publicly available models with weights from the Internet (public models) for evaluating attacks and defenses on ML models. We establish a database, namely SecurityNet, containing 910 annotated image classification models. We then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models. We share SecurityNet with the research community. and advocate researchers to perform experiments on public models to better demonstrate their proposed methods' effectiveness in the future.

* To appear in the 33rd USENIX Security Symposium, August 2024, Philadelphia, PA, USA

Via

Access Paper or Ask Questions

Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights

Oct 18, 2023

Zhengyu Zhao, Hanwei Zhang, Renjue Li, Ronan Sicre, Laurent Amsaleg, Michael Backes, Qi Li, Chao Shen

Figure 1 for Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights

Figure 2 for Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights

Figure 3 for Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights

Figure 4 for Revisiting Transferable Adversarial Image Examples: Attack Categorization, Evaluation Guidelines, and New Insights

Abstract:Transferable adversarial examples raise critical security concerns in real-world, black-box attack scenarios. However, in this work, we identify two main problems in common evaluation practices: (1) For attack transferability, lack of systematic, one-to-one attack comparison and fair hyperparameter settings. (2) For attack stealthiness, simply no comparisons. To address these problems, we establish new evaluation guidelines by (1) proposing a novel attack categorization strategy and conducting systematic and fair intra-category analyses on transferability, and (2) considering diverse imperceptibility metrics and finer-grained stealthiness characteristics from the perspective of attack traceback. To this end, we provide the first large-scale evaluation of transferable adversarial examples on ImageNet, involving 23 representative attacks against 9 representative defenses. Our evaluation leads to a number of new insights, including consensus-challenging ones: (1) Under a fair attack hyperparameter setting, one early attack method, DI, actually outperforms all the follow-up methods. (2) A state-of-the-art defense, DiffPure, actually gives a false sense of (white-box) security since it is indeed largely bypassed by our (black-box) transferable attacks. (3) Even when all attacks are bounded by the same $L_p$ norm, they lead to dramatically different stealthiness performance, which negatively correlates with their transferability performance. Overall, our work demonstrates that existing problematic evaluations have indeed caused misleading conclusions and missing points, and as a result, hindered the assessment of the actual progress in this field.

* Code is available at https://github.com/ZhengyuZhao/TransferAttackEval

Via

Access Paper or Ask Questions

Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

Oct 17, 2023

Rui Wen, Tianhao Wang, Michael Backes, Yang Zhang, Ahmed Salem

Figure 1 for Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

Figure 2 for Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

Figure 3 for Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

Figure 4 for Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

Abstract:Large Language Models (LLMs) are powerful tools for natural language processing, enabling novel applications and user experiences. However, to achieve optimal performance, LLMs often require adaptation with private data, which poses privacy and security challenges. Several techniques have been proposed to adapt LLMs with private data, such as Low-Rank Adaptation (LoRA), Soft Prompt Tuning (SPT), and In-Context Learning (ICL), but their comparative privacy and security properties have not been systematically investigated. In this work, we fill this gap by evaluating the robustness of LoRA, SPT, and ICL against three types of well-established attacks: membership inference, which exposes data leakage (privacy); backdoor, which injects malicious behavior (security); and model stealing, which can violate intellectual property (privacy and security). Our results show that there is no silver bullet for privacy and security in LLM adaptation and each technique has different strengths and weaknesses.

Via

Access Paper or Ask Questions

Provably Robust Cost-Sensitive Learning via Randomized Smoothing

Oct 12, 2023

Yuan Xin, Michael Backes, Xiao Zhang

Abstract:We focus on learning adversarially robust classifiers under a cost-sensitive scenario, where the potential harm of different classwise adversarial transformations is encoded in a binary cost matrix. Existing methods are either empirical that cannot certify robustness or suffer from inherent scalability issues. In this work, we study whether randomized smoothing, a more scalable robustness certification framework, can be leveraged to certify cost-sensitive robustness. Built upon a notion of cost-sensitive certified radius, we show how to adapt the standard randomized smoothing certification pipeline to produce tight robustness guarantees for any cost matrix. In addition, with fine-grained certified radius optimization schemes specifically designed for different data subgroups, we propose an algorithm to train smoothed classifiers that are optimized for cost-sensitive robustness. Extensive experiments on image benchmarks and a real-world medical dataset demonstrate the superiority of our method in achieving significantly improved performance of certified cost-sensitive robustness while having a negligible impact on overall accuracy.

* 18 pages, 7 tables, 4 figures

Via

Access Paper or Ask Questions

Prompt Backdoors in Visual Prompt Learning

Oct 11, 2023

Hai Huang, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang

Figure 1 for Prompt Backdoors in Visual Prompt Learning

Figure 2 for Prompt Backdoors in Visual Prompt Learning

Figure 3 for Prompt Backdoors in Visual Prompt Learning

Figure 4 for Prompt Backdoors in Visual Prompt Learning

Abstract:Fine-tuning large pre-trained computer vision models is infeasible for resource-limited users. Visual prompt learning (VPL) has thus emerged to provide an efficient and flexible alternative to model fine-tuning through Visual Prompt as a Service (VPPTaaS). Specifically, the VPPTaaS provider optimizes a visual prompt given downstream data, and downstream users can use this prompt together with the large pre-trained model for prediction. However, this new learning paradigm may also pose security risks when the VPPTaaS provider instead provides a malicious visual prompt. In this paper, we take the first step to explore such risks through the lens of backdoor attacks. Specifically, we propose BadVisualPrompt, a simple yet effective backdoor attack against VPL. For example, poisoning $5\%$ CIFAR10 training data leads to above $99\%$ attack success rates with only negligible model accuracy drop by $1.5\%$. In particular, we identify and then address a new technical challenge related to interactions between the backdoor trigger and visual prompt, which does not exist in conventional, model-level backdoors. Moreover, we provide in-depth analyses of seven backdoor defenses from model, prompt, and input levels. Overall, all these defenses are either ineffective or impractical to mitigate our BadVisualPrompt, implying the critical vulnerability of VPL.

Via

Access Paper or Ask Questions

Composite Backdoor Attacks Against Large Language Models

Oct 11, 2023

Hai Huang, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang

Figure 1 for Composite Backdoor Attacks Against Large Language Models

Figure 2 for Composite Backdoor Attacks Against Large Language Models

Figure 3 for Composite Backdoor Attacks Against Large Language Models

Figure 4 for Composite Backdoor Attacks Against Large Language Models

Abstract:Large language models (LLMs) have demonstrated superior performance compared to previous methods on various tasks, and often serve as the foundation models for many researches and services. However, the untrustworthy third-party LLMs may covertly introduce vulnerabilities for downstream tasks. In this paper, we explore the vulnerability of LLMs through the lens of backdoor attacks. Different from existing backdoor attacks against LLMs, ours scatters multiple trigger keys in different prompt components. Such a Composite Backdoor Attack (CBA) is shown to be stealthier than implanting the same multiple trigger keys in only a single component. CBA ensures that the backdoor is activated only when all trigger keys appear. Our experiments demonstrate that CBA is effective in both natural language processing (NLP) and multimodal tasks. For instance, with $3\%$ poisoning samples against the LLaMA-7B model on the Emotion dataset, our attack achieves a $100\%$ Attack Success Rate (ASR) with a False Triggered Rate (FTR) below $2.06\%$ and negligible model accuracy degradation. The unique characteristics of our CBA can be tailored for various practical scenarios, e.g., targeting specific user groups. Our work highlights the necessity of increased security research on the trustworthiness of foundation LLMs.

Via

Access Paper or Ask Questions

Transferable Availability Poisoning Attacks

Oct 08, 2023

Yiyong Liu, Michael Backes, Xiao Zhang

Abstract:We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. In this paper, we argue that this assumption is strong, since the victim may choose any learning algorithm to train the model as long as it can achieve some targeted performance on clean data. Empirically, we observe a large decrease in the effectiveness of prior poisoning attacks if the victim uses a different learning paradigm to train the model and show marked differences in frequency-level characteristics between perturbations generated with respect to different learners and attack methods. To enhance the attack transferability, we propose Transferable Poisoning, which generates high-frequency poisoning perturbations by alternately leveraging the gradient information with two specific algorithms selected from supervised and unsupervised contrastive learning paradigms. Through extensive experiments on benchmark image datasets, we show that our transferable poisoning attack can produce poisoned samples with significantly improved transferability, not only applicable to the two learners used to devise the attack but also for learning algorithms and even paradigms beyond.

Via

Access Paper or Ask Questions