Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gautam Kamath

Rethinking Benchmarks for Differentially Private Image Classification

Jan 27, 2026

Sabrina Mokhtari, Sara Kodeiri, Shubhankar Mohapatra, Florian Tramèr, Gautam Kamath

Abstract:We revisit benchmarks for differentially private image classification. We suggest a comprehensive set of benchmarks, allowing researchers to evaluate techniques for differentially private machine learning in a variety of settings, including with and without additional data, in convex settings, and on a variety of qualitatively different datasets. We further test established techniques on these benchmarks in order to see which ideas remain effective in different settings. Finally, we create a publicly available leader board for the community to track progress in differentially private machine learning.

* IEEE Technical Committee on Data Engineering 2025

Via

Access Paper or Ask Questions

Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph

Sep 19, 2025

Gautam Kamath, Alireza F. Pour, Matthew Regehr, David P. Woodruff

Abstract:We propose an algorithm with improved query-complexity for the problem of hypothesis selection under local differential privacy constraints. Given a set of $k$ probability distributions $Q$, we describe an algorithm that satisfies local differential privacy, performs $\tilde{O}(k^{3/2})$ non-adaptive queries to individuals who each have samples from a probability distribution $p$, and outputs a probability distribution from the set $Q$ which is nearly the closest to $p$. Previous algorithms required either $\Omega(k^2)$ queries or many rounds of interactive queries. Technically, we introduce a new object we dub the Scheff\'e graph, which captures structure of the differences between distributions in $Q$, and may be of more broad interest for hypothesis selection tasks.

Via

Access Paper or Ask Questions

Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning

Sep 08, 2025

William Xu, Yiwei Lu, Yihan Wang, Matthew Y. R. Yang, Zuoqiu Liu, Gautam Kamath, Yaoliang Yu

Abstract:Targeted data poisoning attacks pose an increasingly serious threat due to their ease of deployment and high success rates. These attacks aim to manipulate the prediction for a single test sample in classification models. Unlike indiscriminate attacks that aim to decrease overall test performance, targeted attacks present a unique threat to individual test instances. This threat model raises a fundamental question: what factors make certain test samples more susceptible to successful poisoning than others? We investigate how attack difficulty varies across different test instances and identify key characteristics that influence vulnerability. This paper introduces three predictive criteria for targeted data poisoning difficulty: ergodic prediction accuracy (analyzed through clean training dynamics), poison distance, and poison budget. Our experimental results demonstrate that these metrics effectively predict the varying difficulty of real-world targeted poisoning attacks across diverse scenarios, offering practitioners valuable insights for vulnerability assessment and understanding data poisoning attacks.

Via

Access Paper or Ask Questions

On the Learnability of Distribution Classes with Adaptive Adversaries

Sep 05, 2025

Tosca Lechner, Alex Bie, Gautam Kamath

Abstract:We consider the question of learnability of distribution classes in the presence of adaptive adversaries -- that is, adversaries capable of intercepting the samples requested by a learner and applying manipulations with full knowledge of the samples before passing it on to the learner. This stands in contrast to oblivious adversaries, who can only modify the underlying distribution the samples come from but not their i.i.d.\ nature. We formulate a general notion of learnability with respect to adaptive adversaries, taking into account the budget of the adversary. We show that learnability with respect to additive adaptive adversaries is a strictly stronger condition than learnability with respect to additive oblivious adversaries.

Via

Access Paper or Ask Questions

Optimal Differentially Private Sampling of Unbounded Gaussians

Mar 03, 2025

Valentio Iverson, Gautam Kamath, Argyris Mouzakis

Abstract:We provide the first $\widetilde{\mathcal{O}}\left(d\right)$-sample algorithm for sampling from unbounded Gaussian distributions under the constraint of $\left(\varepsilon, \delta\right)$-differential privacy. This is a quadratic improvement over previous results for the same problem, settling an open question of Ghazi, Hu, Kumar, and Manurangsi.

* 47 pages

Via

Access Paper or Ask Questions

BridgePure: Revealing the Fragility of Black-box Data Protection

Dec 30, 2024

Yihan Wang, Yiwei Lu, Xiao-Shan Gao, Gautam Kamath, Yaoliang Yu

Abstract:Availability attacks, or unlearnable examples, are defensive techniques that allow data owners to modify their datasets in ways that prevent unauthorized machine learning models from learning effectively while maintaining the data's intended functionality. It has led to the release of popular black-box tools for users to upload personal data and receive protected counterparts. In this work, we show such black-box protections can be substantially bypassed if a small set of unprotected in-distribution data is available. Specifically, an adversary can (1) easily acquire (unprotected, protected) pairs by querying the black-box protections with the unprotected dataset; and (2) train a diffusion bridge model to build a mapping. This mapping, termed BridgePure, can effectively remove the protection from any previously unseen data within the same distribution. Under this threat model, our method demonstrates superior purification performance on classification and style mimicry tasks, exposing critical vulnerabilities in black-box data protection.

* 26 pages,13 figures

Via

Access Paper or Ask Questions

The Broader Landscape of Robustness in Algorithmic Statistics

Dec 03, 2024

Gautam Kamath

Abstract:The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints. An estimator may be robust in a number of different ways: to contamination of the dataset, to heavy-tailed data, or in the sense that it preserves privacy of the dataset. We survey recent results in these areas with a focus on the problem of mean estimation, drawing technical and conceptual connections between the various forms of robustness, showing that the same underlying algorithmic ideas lead to computationally efficient estimators in all these settings.

Via

Access Paper or Ask Questions

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Sep 29, 2024

Jie Zhang, Debeshee Das, Gautam Kamath, Florian Tramèr

Figure 1 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Figure 2 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Figure 3 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Figure 4 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Abstract:We consider the problem of a training data proof, where a data creator or owner wants to demonstrate to a third party that some machine learning model was trained on their data. Training data proofs play a key role in recent lawsuits against foundation models trained on web-scale data. Many prior works suggest to instantiate training data proofs using membership inference attacks. We argue that this approach is fundamentally unsound: to provide convincing evidence, the data creator needs to demonstrate that their attack has a low false positive rate, i.e., that the attack's output is unlikely under the null hypothesis that the model was not trained on the target data. Yet, sampling from this null hypothesis is impossible, as we do not know the exact contents of the training set, nor can we (efficiently) retrain a large foundation model. We conclude by offering two paths forward, by showing that data extraction attacks and membership inference on special canary data can be used to create sound training data proofs.

Via

Access Paper or Ask Questions

Machine Unlearning Fails to Remove Data Poisoning Attacks

Jun 25, 2024

Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel

Abstract:We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of evaluation settings (e.g., alleviating membership inference attacks), they fail to remove the effects of data poisoning, across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, is required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned datapoints without having to retrain, our work suggests that these methods are not yet "ready for prime time", and currently provide limited benefit over retraining.

Via

Access Paper or Ask Questions

Distribution Learnability and Robustness

Jun 25, 2024

Shai Ben-David, Alex Bie, Gautam Kamath, Tosca Lechner

Abstract:We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning. We show that, contrary to other learning settings (e.g., PAC learning of function classes), realizable learnability of a class of probability distributions does not imply its agnostic learnability. We go on to examine what type of data corruption can disrupt the learnability of a distribution class and what is such learnability robust against. We show that realizable learnability of a class of distributions implies its robust learnability with respect to only additive corruption, but not against subtractive corruption. We also explore related implications in the context of compression schemes and differentially private learnability.

* In NeurIPS 2023

Via

Access Paper or Ask Questions