Alert button
Picture for Giovanni Cherubin

Giovanni Cherubin

Alert button

SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Dec 21, 2022
Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin

Figure 1 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning
Figure 2 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning
Figure 3 for SoK: Let The Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning.

Viaarxiv icon

Synthetic Data -- what, why and how?

May 06, 2022
James Jordon, Lukasz Szpruch, Florimond Houssiau, Mirko Bottarelli, Giovanni Cherubin, Carsten Maple, Samuel N. Cohen, Adrian Weller

Figure 1 for Synthetic Data -- what, why and how?

This explainer document aims to provide an overview of the current state of the rapidly expanding work on synthetic data technologies, with a particular focus on privacy. The article is intended for a non-technical audience, though some formal definitions have been given to provide clarity to specialists. This article is intended to enable the reader to quickly become familiar with the notion of synthetic data, as well as understand some of the subtle intricacies that come with it. We do believe that synthetic data is a very useful tool, and our hope is that this report highlights that, while drawing attention to nuances that can easily be overlooked in its deployment.

* Commissioned by the Royal Society. 57 pages 2 figures 
Viaarxiv icon

Approximating Full Conformal Prediction at Scale via Influence Functions

Feb 02, 2022
Javier Abad, Umang Bhatt, Adrian Weller, Giovanni Cherubin

Figure 1 for Approximating Full Conformal Prediction at Scale via Influence Functions
Figure 2 for Approximating Full Conformal Prediction at Scale via Influence Functions
Figure 3 for Approximating Full Conformal Prediction at Scale via Influence Functions
Figure 4 for Approximating Full Conformal Prediction at Scale via Influence Functions

Conformal prediction (CP) is a wrapper around traditional machine learning models, giving coverage guarantees under the sole assumption of exchangeability; in classification problems, for a chosen significance level $\varepsilon$, CP guarantees that the number of errors is at most $\varepsilon$, irrespective of whether the underlying model is misspecified. However, the prohibitive computational costs of full CP led researchers to design scalable alternatives, which alas do not attain the same guarantees or statistical power of full CP. In this paper, we use influence functions to efficiently approximate full CP. We prove that our method is a consistent approximation of full CP, and empirically show that the approximation error becomes smaller as the training set increases; e.g., for $10^{3}$ training points the two methods output p-values that are $<10^{-3}$ apart: a negligible error for any practical application. Our methods enable scaling full CP to large real-world datasets. We compare our full CP approximation ACP to mainstream CP alternatives, and observe that our method is computationally competitive whilst enjoying the statistical predictive power of full CP.

* 18 pages, 15 figures 
Viaarxiv icon

Reconstructing Training Data with Informed Adversaries

Jan 13, 2022
Borja Balle, Giovanni Cherubin, Jamie Hayes

Figure 1 for Reconstructing Training Data with Informed Adversaries
Figure 2 for Reconstructing Training Data with Informed Adversaries
Figure 3 for Reconstructing Training Data with Informed Adversaries
Figure 4 for Reconstructing Training Data with Informed Adversaries

Given access to a machine learning model, can an adversary reconstruct the model's training data? This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one. By instantiating concrete attacks, we show it is feasible to reconstruct the remaining data point in this stringent threat model. For convex models (e.g. logistic regression), reconstruction attacks are simple and can be derived in closed-form. For more general models (e.g. neural networks), we propose an attack strategy based on training a reconstructor network that receives as input the weights of the model under attack and produces as output the target data point. We demonstrate the effectiveness of our attack on image classifiers trained on MNIST and CIFAR-10, and systematically investigate which factors of standard machine learning pipelines affect reconstruction success. Finally, we theoretically investigate what amount of differential privacy suffices to mitigate reconstruction attacks by informed adversaries. Our work provides an effective reconstruction attack that model developers can use to assess memorization of individual points in general settings beyond those considered in previous works (e.g. generative language models or access to training gradients); it shows that standard models have the capacity to store enough information to enable high-fidelity reconstruction of training data points; and it demonstrates that differential privacy can successfully mitigate such attacks in a parameter regime where utility degradation is minimal.

Viaarxiv icon

Exact Optimization of Conformal Predictors via Incremental and Decremental Learning

Feb 05, 2021
Giovanni Cherubin, Konstantinos Chatzikokolakis, Martin Jaggi

Figure 1 for Exact Optimization of Conformal Predictors via Incremental and Decremental Learning
Figure 2 for Exact Optimization of Conformal Predictors via Incremental and Decremental Learning
Figure 3 for Exact Optimization of Conformal Predictors via Incremental and Decremental Learning
Figure 4 for Exact Optimization of Conformal Predictors via Incremental and Decremental Learning

Conformal Predictors (CP) are wrappers around ML methods, providing error guarantees under weak assumptions on the data distribution. They are suitable for a wide range of problems, from classification and regression to anomaly detection. Unfortunately, their high computational complexity limits their applicability to large datasets. In this work, we show that it is possible to speed up a CP classifier considerably, by studying it in conjunction with the underlying ML method, and by exploiting incremental&decremental learning. For methods such as k-NN, KDE, and kernel LS-SVM, our approach reduces the running time by one order of magnitude, whilst producing exact solutions. With similar ideas, we also achieve a linear speed up for the harder case of bootstrapping. Finally, we extend these techniques to improve upon an optimization of k-NN CP for regression. We evaluate our findings empirically, and discuss when methods are suitable for CP optimization.

Viaarxiv icon