Self-supervised models are increasingly prevalent in machine learning (ML) since they reduce the need for expensively labeled data. Because of their versatility in downstream applications, they are increasingly used as a service exposed via public APIs. At the same time, these encoder models are particularly vulnerable to model stealing attacks due to the high dimensionality of vector representations they output. Yet, encoders remain undefended: existing mitigation strategies for stealing attacks focus on supervised learning. We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing. The intuition is that the log-likelihood of an encoder's output representations is higher on the victim's training data than on test data if it is stolen from the victim, but not if it is independently trained. We compute this log-likelihood using density estimation models. As part of our evaluation, we also propose measuring the fidelity of stolen encoders and quantifying the effectiveness of the theft detection without involving downstream tasks; instead, we leverage mutual information and distance measurements. Our extensive empirical results in the vision domain demonstrate that dataset inference is a promising direction for defending self-supervised models against model stealing.
Differential Privacy (DP) is the de facto standard for reasoning about the privacy guarantees of a training algorithm. Despite the empirical observation that DP reduces the vulnerability of models to existing membership inference (MI) attacks, a theoretical underpinning as to why this is the case is largely missing in the literature. In practice, this means that models need to be trained with DP guarantees that greatly decrease their accuracy. In this paper, we provide a tighter bound on the accuracy of any MI adversary when a training algorithm provides $\epsilon$-DP. Our bound informs the design of a novel privacy amplification scheme, where an effective training set is sub-sampled from a larger set prior to the beginning of training, to greatly reduce the bound on MI accuracy. As a result, our scheme enables $\epsilon$-DP users to employ looser DP guarantees when training their model to limit the success of any MI adversary; this ensures that the model's accuracy is less impacted by the privacy guarantee. Finally, we discuss implications of our MI bound on the field of machine unlearning.
Applying machine learning (ML) to sensitive domains requires privacy protection of the underlying training data through formal privacy frameworks, such as differential privacy (DP). Yet, usually, the privacy of the training data comes at the costs of the resulting ML models' utility. One reason for this is that DP uses one homogeneous privacy budget epsilon for all training data points, which has to align with the strictest privacy requirement encountered among all data holders. In practice, different data holders might have different privacy requirements and data points of data holders with lower requirements could potentially contribute more information to the training process of the ML models. To account for this possibility, we propose three novel methods that extend the DP framework Private Aggregation of Teacher Ensembles (PATE) to support training an ML model with different personalized privacy guarantees within the training data. We formally describe the methods, provide theoretical analyses of their privacy bounds, and experimentally evaluate their effect on the final model's utility at the example of the MNIST and Adult income datasets. Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline. Thereby, our methods can improve the privacy-utility trade-off in scenarios in which different data holders consent to contribute their sensitive data at different privacy levels.
In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients with a central party (e.g., a company). Because data never "leaves" personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive attacker observing gradients can reconstruct data of individual users. In this paper, we argue that prior work still largely underestimates the vulnerability of FL. This is because prior efforts exclusively consider passive attackers that are honest-but-curious. Instead, we introduce an active and dishonest attacker acting as the central party, who is able to modify the shared model's weights before users compute model gradients. We call the modified weights "trap weights". Our active attacker is able to recover user data perfectly and at near zero costs: the attack requires no complex optimization objectives. Instead, it exploits inherent data leakage from model gradients and amplifies this effect by maliciously altering the weights of the shared model. These specificities enable our attack to scale to models trained with large mini-batches of data. Where attackers from prior work require hours to recover a single data point, our method needs milliseconds to capture the full mini-batch of data from both fully-connected and convolutional deep neural networks. Finally, we consider mitigations. We observe that current implementations of differential privacy (DP) in FL are flawed, as they explicitly trust the central party with the crucial task of adding DP noise, and thus provide no protection against a malicious central party. We also consider other defenses and explain why they are similarly inadequate. A significant redesign of FL is required for it to provide any meaningful form of data privacy to users.
An important problem in deep learning is the privacy and security of neural networks (NNs). Both aspects have long been considered separately. To date, it is still poorly understood how privacy enhancing training affects the robustness of NNs. This paper experimentally evaluates the impact of training with Differential Privacy (DP), a standard method for privacy preservation, on model vulnerability against a broad range of adversarial attacks. The results suggest that private models are less robust than their non-private counterparts, and that adversarial examples transfer better among DP models than between non-private and private ones. Furthermore, detailed analyses of DP and non-DP models suggest significant differences between their gradients. Additionally, this work is the first to observe that an unfavorable choice of parameters in DP training can lead to gradient masking, and, thereby, results in a wrong sense of security.
Machine learning (ML) models are applied in an increasing variety of domains. The availability of large amounts of data and computational resources encourages the development of ever more complex and valuable models. These models are considered intellectual property of the legitimate parties who have trained them, which makes their protection against stealing, illegitimate redistribution, and unauthorized application an urgent need. Digital watermarking presents a strong mechanism for marking model ownership and, thereby, offers protection against those threats. The emergence of numerous watermarking schemes and attacks against them is pushed forward by both academia and industry, which motivates a comprehensive survey on this field. This document at hand provides the first extensive literature review on ML model watermarking schemes and attacks against them. It offers a taxonomy of existing approaches and systemizes general knowledge around them. Furthermore, it assembles the security requirements to watermarking approaches and evaluates schemes published by the scientific community according to them in order to present systematic shortcomings and vulnerabilities. Thus, it can not only serve as valuable guidance in choosing the appropriate scheme for specific scenarios, but also act as an entry point into developing new mechanisms that overcome presented shortcomings, and thereby contribute in advancing the field.
Computational approaches to the analysis of collective behavior in social insects increasingly rely on motion paths as an intermediate data layer from which one can infer individual behaviors or social interactions. Honey bees are a popular model for learning and memory. Previous experience has been shown to affect and modulate future social interactions. So far, no lifetime history observations have been reported for all bees of a colony. In a previous work we introduced a tracking system customized to track up to $4000$ bees over several weeks. In this contribution we present an in-depth description of the underlying multi-step algorithm which both produces the motion paths, and also improves the marker decoding accuracy significantly. We automatically tracked ${\sim}2000$ marked honey bees over 10 weeks with inexpensive recording hardware using markers without any error correction bits. We found that the proposed two-step tracking reduced incorrect ID decodings from initially ${\sim}13\%$ to around $2\%$ post-tracking. Alongside this paper, we publish the first trajectory dataset for all bees in a colony, extracted from ${\sim} 4$ million images. We invite researchers to join the collective scientific effort to investigate this intriguing animal system. All components of our system are open-source.