Alert button
Picture for Nicolas Papernot

Nicolas Papernot

Alert button

The Adversarial Implications of Variable-Time Inference

Sep 05, 2023
Dudi Biton, Aditi Misra, Efrat Levy, Jaidip Kotak, Ron Bitton, Roei Schuster, Nicolas Papernot, Yuval Elovici, Ben Nassi

Machine learning (ML) models are known to be vulnerable to a number of attacks that target the integrity of their predictions or the privacy of their training data. To carry out these attacks, a black-box adversary must typically possess the ability to query the model and observe its outputs (e.g., labels). In this work, we demonstrate, for the first time, the ability to enhance such decision-based attacks. To accomplish this, we present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack. The leakage of inference-state elements into algorithmic timing side channels has never been studied before, and we have found that it can contain rich information that facilitates superior timing attacks that significantly outperform attacks based solely on label outputs. In a case study, we investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors. In our examination of the timing side-channel vulnerabilities associated with this algorithm, we identified the potential to enhance decision-based attacks. We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference. Our experiments show that our adversarial examples exhibit superior perturbation quality compared to a decision-based attack. In addition, we present a new threat model in which dataset inference based solely on timing leakage is performed. To address the timing leakage vulnerability inherent in the NMS algorithm, we explore the potential and limitations of implementing constant-time inference passes as a mitigation strategy.

Viaarxiv icon

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

Jul 20, 2023
David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan

Figure 1 for LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
Figure 2 for LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed censorship approaches treat the issue as a machine learning problem and rely on another LM to detect undesirable content in LLM outputs. In this paper, we present the theoretical limitations of such semantic censorship approaches. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities. Furthermore, we argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones. As a result, we propose that the problem of censorship needs to be reevaluated; it should be treated as a security problem which warrants the adaptation of security-based approaches to mitigate potential risks.

Viaarxiv icon

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

Jul 01, 2023
Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot

Figure 1 for Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD
Figure 2 for Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD
Figure 3 for Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD
Figure 4 for Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

Differentially private stochastic gradient descent (DP-SGD) is the canonical algorithm for private deep learning. While it is known that its privacy analysis is tight in the worst-case, several empirical results suggest that when training on common benchmark datasets, the models obtained leak significantly less privacy for many datapoints. In this paper, we develop a new analysis for DP-SGD that captures the intuition that points with similar neighbors in the dataset enjoy better privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints. In particular, we observe that correctly classified points obtain better privacy guarantees than misclassified points.

Viaarxiv icon

Augment then Smooth: Reconciling Differential Privacy with Certified Robustness

Jun 14, 2023
Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, Nicolas Papernot

Figure 1 for Augment then Smooth: Reconciling Differential Privacy with Certified Robustness
Figure 2 for Augment then Smooth: Reconciling Differential Privacy with Certified Robustness
Figure 3 for Augment then Smooth: Reconciling Differential Privacy with Certified Robustness
Figure 4 for Augment then Smooth: Reconciling Differential Privacy with Certified Robustness

Machine learning models are susceptible to a variety of attacks that can erode trust in their deployment. These threats include attacks against the privacy of training data and adversarial examples that jeopardize model accuracy. Differential privacy and randomized smoothing are effective defenses that provide certifiable guarantees for each of these threats, however, it is not well understood how implementing either defense impacts the other. In this work, we argue that it is possible to achieve both privacy guarantees and certified robustness simultaneously. We provide a framework called DP-CERT for integrating certified robustness through randomized smoothing into differentially private model training. For instance, compared to differentially private stochastic gradient descent on CIFAR10, DP-CERT leads to a 12-fold increase in certified accuracy and a 10-fold increase in the average certified radius at the expense of a drop in accuracy of 1.2%. Through in-depth per-sample metric analysis, we show that the certified radius correlates with the local Lipschitz constant and smoothness of the loss surface. This provides a new way to diagnose when private models will fail to be robust.

* 25 pages, 19 figures 
Viaarxiv icon

When Vision Fails: Text Attacks Against ViT and OCR

Jun 12, 2023
Nicholas Boucher, Jenny Blessing, Ilia Shumailov, Ross Anderson, Nicolas Papernot

Figure 1 for When Vision Fails: Text Attacks Against ViT and OCR
Figure 2 for When Vision Fails: Text Attacks Against ViT and OCR
Figure 3 for When Vision Fails: Text Attacks Against ViT and OCR
Figure 4 for When Vision Fails: Text Attacks Against ViT and OCR

While text-based machine learning models that operate on visual inputs of rendered text have become robust against a wide range of existing attacks, we show that they are still vulnerable to visual adversarial examples encoded as text. We use the Unicode functionality of combining diacritical marks to manipulate encoded text so that small visual perturbations appear when the text is rendered. We show how a genetic algorithm can be used to generate visual adversarial examples in a black-box setting, and conduct a user study to establish that the model-fooling adversarial examples do not affect human comprehension. We demonstrate the effectiveness of these attacks in the real world by creating adversarial examples against production models published by Facebook, Microsoft, IBM, and Google.

Viaarxiv icon

The Curse of Recursion: Training on Generated Data Makes Models Forget

May 31, 2023
Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson

Figure 1 for The Curse of Recursion: Training on Generated Data Makes Models Forget
Figure 2 for The Curse of Recursion: Training on Generated Data Makes Models Forget
Figure 3 for The Curse of Recursion: Training on Generated Data Makes Models Forget
Figure 4 for The Curse of Recursion: Training on Generated Data Makes Models Forget

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.

Viaarxiv icon

Training Private Models That Know What They Don't Know

May 28, 2023
Stephan Rabanser, Anvith Thudi, Abhradeep Thakurta, Krishnamurthy Dvijotham, Nicolas Papernot

Figure 1 for Training Private Models That Know What They Don't Know
Figure 2 for Training Private Models That Know What They Don't Know
Figure 3 for Training Private Models That Know What They Don't Know
Figure 4 for Training Private Models That Know What They Don't Know

Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selective classifiers -- that can abstain when they are unsure -- under a differential privacy constraint. We find that several popular selective prediction approaches are ineffective in a differentially private setting as they increase the risk of privacy leakage. At the same time, we identify that a recent approach that only uses checkpoints produced by an off-the-shelf private learning algorithm stands out as particularly suitable under DP. Further, we show that differential privacy does not just harm utility but also degrades selective classification performance. To analyze this effect across privacy levels, we propose a novel evaluation mechanism which isolate selective prediction performance across model utility levels. Our experimental results show that recovering the performance level attainable by non-private models is possible but comes at a considerable coverage cost as the privacy budget decreases.

Viaarxiv icon

Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models

May 24, 2023
Haonan Duan, Adam Dziedzic, Nicolas Papernot, Franziska Boenisch

Figure 1 for Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Figure 2 for Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Figure 3 for Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Figure 4 for Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models

Large language models (LLMs) are excellent in-context learners. However, the sensitivity of data contained in prompts raises privacy concerns. Our work first shows that these concerns are valid: we instantiate a simple but highly effective membership inference attack against the data used to prompt LLMs. To address this vulnerability, one could forego prompting and resort to fine-tuning LLMs with known algorithms for private gradient descent. However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. We first show that soft prompts can be obtained privately through gradient descent on downstream data. However, this is not the case for discrete prompts. Thus, we orchestrate a noisy vote among an ensemble of LLMs presented with different prompts, i.e., a flock of stochastic parrots. The vote privately transfers the flock's knowledge into a single public prompt. We show that LLMs prompted with our private algorithms closely match the non-private baselines. For example, using GPT3 as the base model, we achieve a downstream accuracy of 92.7% on the sst2 dataset with ($\epsilon=0.147, \delta=10^{-6}$)-differential privacy vs. 95.2% for the non-private baseline. Through our experiments, we also show that our prompt-based approach is easily deployed with existing commercial APIs.

Viaarxiv icon

Have it your way: Individualized Privacy Assignment for DP-SGD

Mar 29, 2023
Franziska Boenisch, Christopher Mühl, Adam Dziedzic, Roy Rinberg, Nicolas Papernot

Figure 1 for Have it your way: Individualized Privacy Assignment for DP-SGD
Figure 2 for Have it your way: Individualized Privacy Assignment for DP-SGD
Figure 3 for Have it your way: Individualized Privacy Assignment for DP-SGD
Figure 4 for Have it your way: Individualized Privacy Assignment for DP-SGD

When training a machine learning model with differential privacy, one sets a privacy budget. This budget represents a maximal privacy violation that any user is willing to face by contributing their data to the training set. We argue that this approach is limited because different users may have different privacy expectations. Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets. To demonstrate their practicality, we introduce a variant of Differentially Private Stochastic Gradient Descent (DP-SGD) which supports such individualized budgets. DP-SGD is the canonical approach to training models with differential privacy. We modify its data sampling and gradient noising mechanisms to arrive at our approach, which we call Individualized DP-SGD (IDP-SGD). Because IDP-SGD provides privacy guarantees tailored to the preferences of individual users and their data points, we find it empirically improves privacy-utility trade-offs.

Viaarxiv icon

Learning with Impartiality to Walk on the Pareto Frontier of Fairness, Privacy, and Utility

Feb 17, 2023
Mohammad Yaghini, Patty Liu, Franziska Boenisch, Nicolas Papernot

Figure 1 for Learning with Impartiality to Walk on the Pareto Frontier of Fairness, Privacy, and Utility
Figure 2 for Learning with Impartiality to Walk on the Pareto Frontier of Fairness, Privacy, and Utility
Figure 3 for Learning with Impartiality to Walk on the Pareto Frontier of Fairness, Privacy, and Utility
Figure 4 for Learning with Impartiality to Walk on the Pareto Frontier of Fairness, Privacy, and Utility

Deploying machine learning (ML) models often requires both fairness and privacy guarantees. Both of these objectives present unique trade-offs with the utility (e.g., accuracy) of the model. However, the mutual interactions between fairness, privacy, and utility are less well-understood. As a result, often only one objective is optimized, while the others are tuned as hyper-parameters. Because they implicitly prioritize certain objectives, such designs bias the model in pernicious, undetectable ways. To address this, we adopt impartiality as a principle: design of ML pipelines should not favor one objective over another. We propose impartially-specified models, which provide us with accurate Pareto frontiers that show the inherent trade-offs between the objectives. Extending two canonical ML frameworks for privacy-preserving learning, we provide two methods (FairDP-SGD and FairPATE) to train impartially-specified models and recover the Pareto frontier. Through theoretical privacy analysis and a comprehensive empirical study, we provide an answer to the question of where fairness mitigation should be integrated within a privacy-aware ML pipeline.

Viaarxiv icon