Alert button
Picture for Ahmed Salem

Ahmed Salem

Alert button

Microsoft Research

Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models

Jun 23, 2023
Adel Elmahdy, Ahmed Salem

Figure 1 for Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models
Figure 2 for Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models
Figure 3 for Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models
Figure 4 for Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models

Natural language processing (NLP) models have become increasingly popular in real-world applications, such as text classification. However, they are vulnerable to privacy attacks, including data reconstruction attacks that aim to extract the data used to train the model. Most previous studies on data reconstruction attacks have focused on LLM, while classification models were assumed to be more secure. In this work, we propose a new targeted data reconstruction attack called the Mix And Match attack, which takes advantage of the fact that most classification models are based on LLM. The Mix And Match attack uses the base model of the target model to generate candidate tokens and then prunes them using the classification head. We extensively demonstrate the effectiveness of the attack using both random and organic canaries. This work highlights the importance of considering the privacy risks associated with data reconstruction attacks in classification models and offers insights into possible leakages.

* 17 pages, 6 figures, 4 tables 
Viaarxiv icon

Two-in-One: A Model Hijacking Attack Against Text Generation Models

May 12, 2023
Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem

Figure 1 for Two-in-One: A Model Hijacking Attack Against Text Generation Models
Figure 2 for Two-in-One: A Model Hijacking Attack Against Text Generation Models
Figure 3 for Two-in-One: A Model Hijacking Attack Against Text Generation Models
Figure 4 for Two-in-One: A Model Hijacking Attack Against Text Generation Models

Machine learning has progressed significantly in various applications ranging from face recognition to text generation. However, its success has been accompanied by different attacks. Recently a new attack has been proposed which raises both accountability and parasitic computing risks, namely the model hijacking attack. Nevertheless, this attack has only focused on image classification tasks. In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability. More concretely, we propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones, e.g., language translation, text summarization, and language modeling. We use a range of text benchmark datasets such as SST-2, TweetEval, AGnews, QNLI, and IMDB to evaluate the performance of our attacks. Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.

* To appear in the 32nd USENIX Security Symposium, August 2023, Anaheim, CA, USA 
Viaarxiv icon

Analyzing Leakage of Personally Identifiable Information in Language Models

Feb 01, 2023
Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

Figure 1 for Analyzing Leakage of Personally Identifiable Information in Language Models
Figure 2 for Analyzing Leakage of Personally Identifiable Information in Language Models
Figure 3 for Analyzing Leakage of Personally Identifiable Information in Language Models
Figure 4 for Analyzing Leakage of Personally Identifiable Information in Language Models

Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scrubbing techniques reduce but do not prevent the risk of PII leakage: in practice scrubbing is imperfect and must balance the trade-off between minimizing disclosure and preserving the utility of the dataset. On the other hand, it is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee sentence- or user-level privacy, prevent PII disclosure. In this work, we propose (i) a taxonomy of PII leakage in LMs, (ii) metrics to quantify PII leakage, and (iii) attacks showing that PII leakage is a threat in practice. Our taxonomy provides rigorous game-based definitions for PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LM. We empirically evaluate attacks against GPT-2 models fine-tuned on three domains: case law, health care, and e-mails. Our main contributions are (i) novel attacks that can extract up to 10 times more PII sequences as existing attacks, (ii) showing that sentence-level differential privacy reduces the risk of PII disclosure but still leaks about 3% of PII sequences, and (iii) a subtle connection between record-level membership inference and PII reconstruction.

Viaarxiv icon

SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Dec 21, 2022
Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin

Figure 1 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning
Figure 2 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning
Figure 3 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning.

Viaarxiv icon

UnGANable: Defending Against GAN-based Face Manipulation

Oct 03, 2022
Zheng Li, Ning Yu, Ahmed Salem, Michael Backes, Mario Fritz, Yang Zhang

Figure 1 for UnGANable: Defending Against GAN-based Face Manipulation
Figure 2 for UnGANable: Defending Against GAN-based Face Manipulation
Figure 3 for UnGANable: Defending Against GAN-based Face Manipulation
Figure 4 for UnGANable: Defending Against GAN-based Face Manipulation

Deepfakes pose severe threats of visual misinformation to our society. One representative deepfake application is face manipulation that modifies a victim's facial attributes in an image, e.g., changing her age or hair color. The state-of-the-art face manipulation techniques rely on Generative Adversarial Networks (GANs). In this paper, we propose the first defense system, namely UnGANable, against GAN-inversion-based face manipulation. In specific, UnGANable focuses on defending GAN inversion, an essential step for face manipulation. Its core technique is to search for alternative images (called cloaked images) around the original images (called target images) in image space. When posted online, these cloaked images can jeopardize the GAN inversion process. We consider two state-of-the-art inversion techniques including optimization-based inversion and hybrid inversion, and design five different defenses under five scenarios depending on the defender's background knowledge. Extensive experiments on four popular GAN models trained on two benchmark face datasets show that UnGANable achieves remarkable effectiveness and utility performance, and outperforms multiple baseline methods. We further investigate four adaptive adversaries to bypass UnGANable and show that some of them are slightly effective.

* Accepted by USENIX Security 2023 
Viaarxiv icon

Bayesian Estimation of Differential Privacy

Jun 15, 2022
Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones

Figure 1 for Bayesian Estimation of Differential Privacy
Figure 2 for Bayesian Estimation of Differential Privacy
Figure 3 for Bayesian Estimation of Differential Privacy
Figure 4 for Bayesian Estimation of Differential Privacy

Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the privacy budget $\varepsilon$ spent on training a model. Existing approaches derive confidence intervals for $\varepsilon$ from confidence intervals for the false positive and false negative rates of membership inference attacks. Unfortunately, obtaining narrow high-confidence intervals for $\epsilon$ using this method requires an impractically large sample size and training as many models as samples. We propose a novel Bayesian method that greatly reduces sample size, and adapt and validate a heuristic to draw more than one sample per trained model. Our Bayesian method exploits the hypothesis testing interpretation of differential privacy to obtain a posterior for $\varepsilon$ (not just a confidence interval) from the joint posterior of the false positive and false negative rates of membership inference attacks. For the same sample size and confidence, we derive confidence intervals for $\varepsilon$ around 40% narrower than prior work. The heuristic, which we adapt from label-only DP, can be used to further reduce the number of trained models needed to get enough samples by up to 2 orders of magnitude.

* 17 pages, 8 figures. Joint main authors: Santiago Zanella-B\'eguelin, Lukas Wutschitz, and Shruti Tople 
Viaarxiv icon

Get a Model! Model Hijacking Attack Against Machine Learning Models

Nov 08, 2021
Ahmed Salem, Michael Backes, Yang Zhang

Figure 1 for Get a Model! Model Hijacking Attack Against Machine Learning Models
Figure 2 for Get a Model! Model Hijacking Attack Against Machine Learning Models
Figure 3 for Get a Model! Model Hijacking Attack Against Machine Learning Models
Figure 4 for Get a Model! Model Hijacking Attack Against Machine Learning Models

Machine learning (ML) has established itself as a cornerstone for various critical applications ranging from autonomous driving to authentication systems. However, with this increasing adoption rate of machine learning models, multiple attacks have emerged. One class of such attacks is training time attack, whereby an adversary executes their attack before or during the machine learning model training. In this work, we propose a new training time attack against computer vision based machine learning models, namely model hijacking attack. The adversary aims to hijack a target model to execute a different task than its original one without the model owner noticing. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Model hijacking attacks are launched in the same way as existing data poisoning attacks. However, one requirement of the model hijacking attack is to be stealthy, i.e., the data samples used to hijack the target model should look similar to the model's original training dataset. To this end, we propose two different model hijacking attacks, namely Chameleon and Adverse Chameleon, based on a novel encoder-decoder style ML model, namely the Camouflager. Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.

* To Appear in NDSS 2022 
Viaarxiv icon

ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models

Feb 04, 2021
Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang

Figure 1 for ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models
Figure 2 for ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models
Figure 3 for ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models
Figure 4 for ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models

Inference attacks against Machine Learning (ML) models allow adversaries to learn information about training data, model parameters, etc. While researchers have studied these attacks thoroughly, they have done so in isolation. We lack a comprehensive picture of the risks caused by the attacks, such as the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of defense techniques. In this paper, we fill this gap by presenting a first-of-its-kind holistic risk assessment of different inference attacks against machine learning models. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing - and establish a threat model taxonomy. Our extensive experimental evaluation conducted over five model architectures and four datasets shows that the complexity of the training dataset plays an important role with respect to the attack's performance, while the effectiveness of model stealing and membership inference attacks are negatively correlated. We also show that defenses like DP-SGD and Knowledge Distillation can only hope to mitigate some of the inference attacks. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models, and equally serves as a benchmark tool for researchers and practitioners.

Viaarxiv icon

BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models

Oct 08, 2020
Ahmed Salem, Yannick Sautter, Michael Backes, Mathias Humbert, Yang Zhang

Figure 1 for BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models
Figure 2 for BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models
Figure 3 for BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models
Figure 4 for BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models

The tremendous progress of autoencoders and generative adversarial networks (GANs) has led to their application to multiple critical tasks, such as fraud detection and sanitized data generation. This increasing adoption has fostered the study of security and privacy risks stemming from these models. However, previous works have mainly focused on membership inference attacks. In this work, we explore one of the most severe attacks against machine learning models, namely the backdoor attack, against both autoencoders and GANs. The backdoor attack is a training time attack where the adversary implements a hidden backdoor in the target model that can only be activated by a secret trigger. State-of-the-art backdoor attacks focus on classification-based tasks. We extend the applicability of backdoor attacks to autoencoders and GAN-based models. More concretely, we propose the first backdoor attack against autoencoders and GANs where the adversary can control what the decoded or generated images are when the backdoor is activated. Our results show that the adversary can build a backdoored autoencoder that returns a target output for all backdoored inputs, while behaving perfectly normal on clean inputs. Similarly, for the GANs, our experiments show that the adversary can generate data from a different distribution when the backdoor is activated, while maintaining the same utility when the backdoor is not.

Viaarxiv icon

Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

Oct 07, 2020
Ahmed Salem, Michael Backes, Yang Zhang

Figure 1 for Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks
Figure 2 for Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks
Figure 3 for Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks
Figure 4 for Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

Backdoor attack against deep neural networks is currently being profoundly investigated due to its severe security consequences. Current state-of-the-art backdoor attacks require the adversary to modify the input, usually by adding a trigger to it, for the target model to activate the backdoor. This added trigger not only increases the difficulty of launching the backdoor attack in the physical world, but also can be easily detected by multiple defense mechanisms. In this paper, we present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor. Our attack is based on the dropout technique. Concretely, we associate a set of target neurons that are dropped out during model training with the target label. In the prediction phase, the model will output the target label when the target neurons are dropped again, i.e., the backdoor attack is launched. This triggerless feature of our attack makes it practical in the physical world. Extensive experiments show that our triggerless backdoor attack achieves a perfect attack success rate with a negligible damage to the model's utility.

Viaarxiv icon