Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Evans

SoK: Pitfalls in Evaluating Black-Box Attacks

Oct 26, 2023

Fnu Suya, Anshuman Suri, Tingwei Zhang, Jingtao Hong, Yuan Tian, David Evans

Figure 1 for SoK: Pitfalls in Evaluating Black-Box Attacks

Abstract:Numerous works study black-box attacks on image classifiers. However, these works make different assumptions on the adversary's knowledge and current literature lacks a cohesive organization centered around the threat model. To systematize knowledge in this area, we propose a taxonomy over the threat space spanning the axes of feedback granularity, the access of interactive queries, and the quality and quantity of the auxiliary data available to the attacker. Our new taxonomy provides three key insights. 1) Despite extensive literature, numerous under-explored threat spaces exist, which cannot be trivially solved by adapting techniques from well-explored settings. We demonstrate this by establishing a new state-of-the-art in the less-studied setting of access to top-k confidence scores by adapting techniques from well-explored settings of accessing the complete confidence vector, but show how it still falls short of the more restrictive setting that only obtains the prediction label, highlighting the need for more research. 2) Identification the threat model of different attacks uncovers stronger baselines that challenge prior state-of-the-art claims. We demonstrate this by enhancing an initially weaker baseline (under interactive query access) via surrogate models, effectively overturning claims in the respective paper. 3) Our taxonomy reveals interactions between attacker knowledge that connect well to related areas, such as model inversion and extraction attacks. We discuss how advances in other areas can enable potentially stronger black-box attacks. Finally, we emphasize the need for a more realistic assessment of attack success by factoring in local attack runtime. This approach reveals the potential for certain attacks to achieve notably higher success rates and the need to evaluate attacks in diverse and harder settings, highlighting the need for better selection criteria.

Via

Access Paper or Ask Questions

SoK: Memorization in General-Purpose Large Language Models

Oct 24, 2023

Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West

Abstract:Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data. This memorization goes beyond mere language, and encompasses information only present in a few documents. This is often desirable since it is necessary for performing tasks such as question answering, and therefore an important part of learning, but also brings a whole array of issues, from privacy and security to copyright and beyond. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals. We describe the implications of each type of memorization - both positive and negative - for model performance, privacy, security and confidentiality, copyright, and auditing, and ways to detect and prevent memorization. We further highlight the challenges that arise from the predominant way of defining memorization with respect to model behavior instead of model weights, due to LLM-specific phenomena such as reasoning capabilities or differences between decoding algorithms. Throughout the paper, we describe potential risks and opportunities arising from memorization in LLMs that we hope will motivate new research directions.

Via

Access Paper or Ask Questions

When Can Linear Learners be Robust to Indiscriminate Poisoning Attacks?

Jul 03, 2023

Fnu Suya, Xiao Zhang, Yuan Tian, David Evans

Figure 1 for When Can Linear Learners be Robust to Indiscriminate Poisoning Attacks?

Figure 2 for When Can Linear Learners be Robust to Indiscriminate Poisoning Attacks?

Figure 3 for When Can Linear Learners be Robust to Indiscriminate Poisoning Attacks?

Figure 4 for When Can Linear Learners be Robust to Indiscriminate Poisoning Attacks?

Abstract:We study indiscriminate poisoning for linear learners where an adversary injects a few crafted examples into the training data with the goal of forcing the induced model to incur higher test error. Inspired by the observation that linear learners on some datasets are able to resist the best known attacks even without any defenses, we further investigate whether datasets can be inherently robust to indiscriminate poisoning attacks for linear learners. For theoretical Gaussian distributions, we rigorously characterize the behavior of an optimal poisoning attack, defined as the poisoning strategy that attains the maximum risk of the induced model at a given poisoning budget. Our results prove that linear learners can indeed be robust to indiscriminate poisoning if the class-wise data distributions are well-separated with low variance and the size of the constraint set containing all permissible poisoning points is also small. These findings largely explain the drastic variation in empirical attack performance of the state-of-the-art poisoning attacks on linear learners across benchmark datasets, making an important initial step towards understanding the underlying reasons some learning tasks are vulnerable to data poisoning attacks.

Via

Access Paper or Ask Questions

Manipulating Transfer Learning for Property Inference

Mar 21, 2023

Yulong Tian, Fnu Suya, Anshuman Suri, Fengyuan Xu, David Evans

Figure 1 for Manipulating Transfer Learning for Property Inference

Figure 2 for Manipulating Transfer Learning for Property Inference

Figure 3 for Manipulating Transfer Learning for Property Inference

Figure 4 for Manipulating Transfer Learning for Property Inference

Abstract:Transfer learning is a popular method for tuning pretrained (upstream) models for different downstream tasks using limited data and computational resources. We study how an adversary with control over an upstream model used in transfer learning can conduct property inference attacks on a victim's tuned downstream model. For example, to infer the presence of images of a specific individual in the downstream training set. We demonstrate attacks in which an adversary can manipulate the upstream model to conduct highly effective and specific property inference attacks (AUC score $> 0.9$), without incurring significant performance loss on the main task. The main idea of the manipulation is to make the upstream model generate activations (intermediate features) with different distributions for samples with and without a target property, thus enabling the adversary to distinguish easily between downstream models trained with and without training examples that have the target property. Our code is available at https://github.com/yulongt23/Transfer-Inference.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

GlucoSynth: Generating Differentially-Private Synthetic Glucose Traces

Mar 02, 2023

Josephine Lamp, Mark Derdzinski, Christopher Hannemann, Joost van der Linden, Lu Feng, Tianhao Wang, David Evans

Abstract:In this paper we focus on the problem of generating high-quality, private synthetic glucose traces, a task generalizable to many other time series sources. Existing methods for time series data synthesis, such as those using Generative Adversarial Networks (GANs), are not able to capture the innate characteristics of glucose data and, in terms of privacy, either do not include any formal privacy guarantees or, in order to uphold a strong formal privacy guarantee, severely degrade the utility of the synthetic data. Therefore, in this paper we present GlucoSynth, a novel privacy-preserving GAN framework to generate synthetic glucose traces. The core intuition in our approach is to conserve relationships amongst motifs (glucose events) within the traces, in addition to typical temporal dynamics. Moreover, we integrate differential privacy into the framework to provide strong formal privacy guarantees. Finally, we provide a comprehensive evaluation on the real-world utility of the data using 1.2 million glucose traces

Via

Access Paper or Ask Questions

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Jan 06, 2023

Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, Robert Sim

Figure 1 for TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Figure 2 for TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Figure 3 for TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Figure 4 for TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Abstract:With tools like GitHub Copilot, automatic code suggestion is no longer a dream in software engineering. These tools, based on large language models, are typically trained on massive corpora of code mined from unvetted public sources. As a result, these models are susceptible to data poisoning attacks where an adversary manipulates the model's training or fine-tuning phases by injecting malicious data. Poisoning attacks could be designed to influence the model's suggestions at run time for chosen contexts, such as inducing the model into suggesting insecure code payloads. To achieve this, prior poisoning attacks explicitly inject the insecure code payload into the training data, making the poisoning data detectable by static analysis tools that can remove such malicious data from the training set. In this work, we demonstrate two novel data poisoning attacks, COVERT and TROJANPUZZLE, that can bypass static analysis by planting malicious poisoning data in out-of-context regions such as docstrings. Our most novel attack, TROJANPUZZLE, goes one step further in generating less suspicious poisoning data by never including certain (suspicious) parts of the payload in the poisoned data, while still inducing a model that suggests the entire payload when completing code (i.e., outside docstrings). This makes TROJANPUZZLE robust against signature-based dataset-cleansing methods that identify and filter out suspicious sequences from the training data. Our evaluation against two model sizes demonstrates that both COVERT and TROJANPUZZLE have significant implications for how practitioners should select code used to train or tune code-suggestion models.

Via

Access Paper or Ask Questions

SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Dec 21, 2022

Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin

Figure 1 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Figure 2 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Figure 3 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Abstract:Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning.

Via

Access Paper or Ask Questions

Dissecting Distribution Inference

Dec 15, 2022

Anshuman Suri, Yifu Lu, Yanjin Chen, David Evans

Figure 1 for Dissecting Distribution Inference

Figure 2 for Dissecting Distribution Inference

Figure 3 for Dissecting Distribution Inference

Figure 4 for Dissecting Distribution Inference

Abstract:A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at https://github.com/iamgroot42/dissecting_distribution_inference

* Accepted at SaTML 2023

Via

Access Paper or Ask Questions

Are Attribute Inference Attacks Just Imputation?

Sep 02, 2022

Bargav Jayaraman, David Evans

Figure 1 for Are Attribute Inference Attacks Just Imputation?

Figure 2 for Are Attribute Inference Attacks Just Imputation?

Figure 3 for Are Attribute Inference Attacks Just Imputation?

Figure 4 for Are Attribute Inference Attacks Just Imputation?

Abstract:Models can expose sensitive information about their training data. In an attribute inference attack, an adversary has partial knowledge of some training records and access to a model trained on those records, and infers the unknown values of a sensitive feature of those records. We study a fine-grained variant of attribute inference we call \emph{sensitive value inference}, where the adversary's goal is to identify with high confidence some records from a candidate set where the unknown attribute has a particular sensitive value. We explicitly compare attribute inference with data imputation that captures the training distribution statistics, under various assumptions about the training data available to the adversary. Our main conclusions are: (1) previous attribute inference methods do not reveal more about the training data from the model than can be inferred by an adversary without access to the trained model, but with the same knowledge of the underlying distribution as needed to train the attribute inference attack; (2) black-box attribute inference attacks rarely learn anything that cannot be learned without the model; but (3) white-box attacks, which we introduce and evaluate in the paper, can reliably identify some records with the sensitive value attribute that would not be predicted without having access to the model. Furthermore, we show that proposed defenses such as differentially private training and removing vulnerable records from training do not mitigate this privacy risk. The code for our experiments is available at \url{https://github.com/bargavj/EvaluatingDPML}.

* 13 (main body) + 4 (references and appendix) pages. To appear in CCS'22

Via

Access Paper or Ask Questions

Memorization in NLP Fine-tuning Methods

May 25, 2022

Fatemehsadat Mireshghallah, Archit Uniyal, Tianhao Wang, David Evans, Taylor Berg-Kirkpatrick

Figure 1 for Memorization in NLP Fine-tuning Methods

Figure 2 for Memorization in NLP Fine-tuning Methods

Figure 3 for Memorization in NLP Fine-tuning Methods

Figure 4 for Memorization in NLP Fine-tuning Methods

Abstract:Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorization risk. This presents increasing concern as the "pre-train and fine-tune" paradigm proliferates. In this paper, we empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.

Via

Access Paper or Ask Questions