Alert button
Picture for Luca Melis

Luca Melis

Alert button

Federated Linear Contextual Bandits with User-level Differential Privacy

Jun 09, 2023
Ruiquan Huang, Huanyu Zhang, Luca Melis, Milan Shen, Meisam Hajzinia, Jing Yang

Figure 1 for Federated Linear Contextual Bandits with User-level Differential Privacy
Figure 2 for Federated Linear Contextual Bandits with User-level Differential Privacy

This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as $\texttt{ROBIN}$ and show that it is near-optimal in terms of the number of clients $M$ and the privacy budget $\varepsilon$ by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level $(\varepsilon,\delta)$-LDP must suffer a regret blow-up factor at least $\min\{1/\varepsilon,M\}$ or $\min\{1/\sqrt{\varepsilon},\sqrt{M}\}$ under different conditions.

* Accepted by ICML 2023 
Viaarxiv icon

EXACT: Extensive Attack for Split Learning

May 25, 2023
Xinchi Qiu, Ilias Leontiadis, Luca Melis, Alex Sablayrolles, Pierre Stock

Figure 1 for EXACT: Extensive Attack for Split Learning
Figure 2 for EXACT: Extensive Attack for Split Learning
Figure 3 for EXACT: Extensive Attack for Split Learning
Figure 4 for EXACT: Extensive Attack for Split Learning

Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device Machine Learning allows us to completely avoid sharing information with a third-party server during inference. However, on-device models are typically less accurate when compared to the server counterparts due to the fact that (1) they typically only rely on a small set of on-device features and (2) they need to be small enough to run efficiently on end-user devices. Split Learning (SL) is a promising approach that can overcome these limitations. In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server-side and a smaller part executing on-device, aiming to incorporate the private features. However, end-to-end training of such models requires exchanging gradients at the cut layer, which might encode private features or labels. In this paper, we provide insights into potential privacy risks associated with SL and introduce a novel attack method, EXACT, to reconstruct private information. Furthermore, we also investigate the effectiveness of various mitigation strategies. Our results indicate that the gradients significantly improve the attacker's effectiveness in all three datasets reaching almost 100% reconstruction accuracy for some features. However, a small amount of differential privacy (DP) is quite effective in mitigating this risk without causing significant training degradation.

* 10 pages 
Viaarxiv icon

FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning

Jun 07, 2022
Meisam Hejazinia, Dzmitry Huba, Ilias Leontiadis, Kiwan Maeng, Mani Malek, Luca Melis, Ilya Mironov, Milad Nasr, Kaikai Wang, Carole-Jean Wu

Figure 1 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning
Figure 2 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning
Figure 3 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning
Figure 4 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning

Federated learning (FL) has emerged as an effective approach to address consumer privacy needs. FL has been successfully applied to certain machine learning tasks, such as training smart keyboard models and keyword spotting. Despite FL's initial success, many important deep learning use cases, such as ranking and recommendation tasks, have been limited from on-device learning. One of the key challenges faced by practical FL adoption for DL-based ranking and recommendation is the prohibitive resource requirements that cannot be satisfied by modern mobile systems. We propose Federated Ensemble Learning (FEL) as a solution to tackle the large memory requirement of deep learning ranking and recommendation tasks. FEL enables large-scale ranking and recommendation model training on-device by simultaneously training multiple model versions on disjoint clusters of client devices. FEL integrates the trained sub-models via an over-arch layer into an ensemble model that is hosted on the server. Our experiments demonstrate that FEL leads to 0.43-2.31% model quality improvement over traditional on-device federated learning - a significant improvement for ranking and recommendation system use cases.

Viaarxiv icon

Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

May 30, 2022
Kiwan Maeng, Haiyu Lu, Luca Melis, John Nguyen, Mike Rabbat, Carole-Jean Wu

Figure 1 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity
Figure 2 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity
Figure 3 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity
Figure 4 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

Federated learning (FL) is an effective mechanism for data privacy in recommender systems by running machine learning model training on-device. While prior FL optimizations tackled the data and system heterogeneity challenges faced by FL, they assume the two are independent of each other. This fundamental assumption is not reflective of real-world, large-scale recommender systems -- data and system heterogeneity are tightly intertwined. This paper takes a data-driven approach to show the inter-dependence of data and system heterogeneity in real-world data and quantifies its impact on the overall model quality and fairness. We design a framework, RF^2, to model the inter-dependence and evaluate its impact on state-of-the-art model optimization techniques for federated recommendation tasks. We demonstrate that the impact on fairness can be severe under realistic heterogeneity scenarios, by up to 15.8--41x compared to a simple setup assumed in most (if not all) prior work. It means when realistic system-induced data heterogeneity is not properly modeled, the fairness impact of an optimization can be downplayed by up to 41x. The result shows that modeling realistic system-induced data heterogeneity is essential to achieving fair federated recommendation learning. We plan to open-source RF^2 to enable future design and evaluation of FL innovations.

Viaarxiv icon

Differentially Private Query Release Through Adaptive Projection

Mar 11, 2021
Sergul Aydore, William Brown, Michael Kearns, Krishnaram Kenthapadi, Luca Melis, Aaron Roth, Ankit Siva

Figure 1 for Differentially Private Query Release Through Adaptive Projection
Figure 2 for Differentially Private Query Release Through Adaptive Projection
Figure 3 for Differentially Private Query Release Through Adaptive Projection
Figure 4 for Differentially Private Query Release Through Adaptive Projection

We propose, implement, and evaluate a new algorithm for releasing answers to very large numbers of statistical queries like $k$-way marginals, subject to differential privacy. Our algorithm makes adaptive use of a continuous relaxation of the Projection Mechanism, which answers queries on the private dataset using simple perturbation, and then attempts to find the synthetic dataset that most closely matches the noisy answers. We use a continuous relaxation of the synthetic dataset domain which makes the projection loss differentiable, and allows us to use efficient ML optimization techniques and tooling. Rather than answering all queries up front, we make judicious use of our privacy budget by iteratively and adaptively finding queries for which our (relaxed) synthetic data has high error, and then repeating the projection. We perform extensive experimental evaluations across a range of parameters and datasets, and find that our method outperforms existing algorithms in many cases, especially when the privacy budget is small or the query class is large.

Viaarxiv icon

Adversarial Robustness with Non-uniform Perturbations

Feb 24, 2021
Ecenaz Erdemir, Jeffrey Bickford, Luca Melis, Sergul Aydore

Figure 1 for Adversarial Robustness with Non-uniform Perturbations
Figure 2 for Adversarial Robustness with Non-uniform Perturbations
Figure 3 for Adversarial Robustness with Non-uniform Perturbations
Figure 4 for Adversarial Robustness with Non-uniform Perturbations

Robustness of machine learning models is critical for security related applications, where real-world adversaries are uniquely focused on evading neural network based detectors. Prior work mainly focus on crafting adversarial examples with small uniform norm-bounded perturbations across features to maintain the requirement of imperceptibility. Although such approaches are valid for images, uniform perturbations do not result in realistic adversarial examples in domains such as malware, finance, and social networks. For these types of applications, features typically have some semantically meaningful dependencies. The key idea of our proposed approach is to enable non-uniform perturbations that can adequately represent these feature dependencies during adversarial training. We propose using characteristics of the empirical data distribution, both on correlations between the features and the importance of the features themselves. Using experimental datasets for malware classification, credit risk prediction, and spam detection, we show that our approach is more robust to real-world attacks. Our approach can be adapted to other domains where non-uniform perturbations more accurately represent realistic adversarial examples.

* 21 pages, 6 figures 
Viaarxiv icon

Exploiting Unintended Feature Leakage in Collaborative Learning

Nov 01, 2018
Luca Melis, Congzheng Song, Emiliano De Cristofaro, Vitaly Shmatikov

Figure 1 for Exploiting Unintended Feature Leakage in Collaborative Learning
Figure 2 for Exploiting Unintended Feature Leakage in Collaborative Learning
Figure 3 for Exploiting Unintended Feature Leakage in Collaborative Learning
Figure 4 for Exploiting Unintended Feature Leakage in Collaborative Learning

Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points -- for example, specific locations -- in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.

* Proceedings of 40th IEEE Symposium on Security & Privacy (S&P 2019) 
Viaarxiv icon

Building and Measuring Privacy-Preserving Predictive Blacklists

Oct 08, 2018
Luca Melis, Apostolos Pyrgelis, Emiliano De Cristofaro

Figure 1 for Building and Measuring Privacy-Preserving Predictive Blacklists
Figure 2 for Building and Measuring Privacy-Preserving Predictive Blacklists
Figure 3 for Building and Measuring Privacy-Preserving Predictive Blacklists
Figure 4 for Building and Measuring Privacy-Preserving Predictive Blacklists

(Withdrawn) Collaborative security initiatives are increasingly often advocated to improve timeliness and effectiveness of threat mitigation. Among these, collaborative predictive blacklisting (CPB) aims to forecast attack sources based on alerts contributed by multiple organizations that might be targeted in similar ways. Alas, CPB proposals thus far have only focused on improving hit counts, but overlooked the impact of collaboration on false positives and false negatives. Moreover, sharing threat intelligence often prompts important privacy, confidentiality, and liability issues. In this paper, we first provide a comprehensive measurement analysis of two state-of-the-art CPB systems: one that uses a trusted central party to collect alerts [Soldo et al., Infocom'10] and a peer-to-peer one relying on controlled data sharing [Freudiger et al., DIMVA'15], studying the impact of collaboration on both correct and incorrect predictions. Then, we present a novel privacy-friendly approach that significantly improves over previous work, achieving a better balance of true and false positive rates, while minimizing information disclosure. Finally, we present an extension that allows our system to scale to very large numbers of organizations.

* Obsolete paper. For more up-to-date work on collaborative predictive blacklisting, see arXiv:1810.02649 
Viaarxiv icon

LOGAN: Membership Inference Attacks Against Generative Models

Aug 21, 2018
Jamie Hayes, Luca Melis, George Danezis, Emiliano De Cristofaro

Figure 1 for LOGAN: Membership Inference Attacks Against Generative Models
Figure 2 for LOGAN: Membership Inference Attacks Against Generative Models
Figure 3 for LOGAN: Membership Inference Attacks Against Generative Models
Figure 4 for LOGAN: Membership Inference Attacks Against Generative Models

Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution. In this paper, we present the first membership inference attacks against generative models: given a data point, the adversary determines whether or not it was used to train the model. Our attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator's capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, against several state-of-the-art generative models, over datasets of complex representations of faces (LFW), objects (CIFAR-10), and medical images (Diabetic Retinopathy). We also discuss the sensitivity of the attacks to different training parameters, and their robustness against mitigation strategies, finding that defenses are either ineffective or lead to significantly worse performances of the generative models in terms of training stability and/or sample quality.

* Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2019, Issue 1  
Viaarxiv icon