The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.
Running a randomized algorithm on a subsampled dataset instead of the entire dataset amplifies differential privacy guarantees. In this work, in a federated setting, we consider random participation of the clients in addition to subsampling their local datasets. Since such random participation of the clients creates correlation among the samples of the same client in their subsampling, we analyze the corresponding privacy amplification via non-uniform subsampling. We show that when the size of the local datasets is small, the privacy guarantees via random participation is close to those of the centralized setting, in which the entire dataset is located in a single host and subsampled. On the other hand, when the local datasets are large, observing the output of the algorithm may disclose the identities of the sampled clients with high confidence. Our analysis reveals that, even in this case, privacy guarantees via random participation outperform those via only local subsampling.
We consider distributed inference at the wireless edge, where multiple clients with an ensemble of models, each trained independently on a local dataset, are queried in parallel to make an accurate decision on a new sample. In addition to maximizing inference accuracy, we also want to maximize the privacy of local models. We exploit the superposition property of the air to implement bandwidth-efficient ensemble inference methods. We introduce different over-the-air ensemble methods and show that these schemes perform significantly better than their orthogonal counterparts, while using less resources and providing privacy guarantees. We also provide experimental results verifying the benefits of the proposed over-the-air inference approach, whose source code is shared publicly on Github.
While rich medical datasets are hosted in hospitals distributed across the world, concerns on patients' privacy is a barrier against using such data to train deep neural networks (DNNs) for medical diagnostics. We propose Dopamine, a system to train DNNs on distributed datasets, which employs federated learning (FL) with differentially-private stochastic gradient descent (DPSGD), and, in combination with secure aggregation, can establish a better trade-off between differential privacy (DP) guarantee and DNN's accuracy than other approaches. Results on a diabetic retinopathy~(DR) task show that Dopamine provides a DP guarantee close to the centralized training counterpart, while achieving a better classification accuracy than FL with parallel DP where DPSGD is applied without coordination. Code is available at https://github.com/ipc-lab/private-ml-for-health.
While rich medical datasets are hosted in hospitals distributed across countries, concerns on patients' privacy is a barrier against utilizing such data to train deep neural networks (DNNs) for medical diagnostics. We propose Dopamine, a system to train DNNs on distributed medical data, which employs federated learning (FL) with differentially-private stochastic gradient descent (DPSGD), and, in combination with secure multi-party aggregation, can establish a better privacy-utility trade-off than the existing approaches. Results on a diabetic retinopathy (DR) task show that Dopamine provides a privacy guarantee close to the centralized training counterpart, while achieving a better classification accuracy than FL with parallel differential privacy where DPSGD is applied without coordination. Code is available at https://github.com/ipc-lab/private-ml-for-health.
We study the problem of differentially private wireless federated learning (FL). In conventional FL, differential privacy (DP) guarantees can be obtained by injecting additional noise to local model updates before transmitting to the parameter server (PS). In the wireless FL scenario, we show that the privacy of the system can be boosted by exploiting over-the-air computation (OAC) and anonymizing the transmitting devices. In OAC, devices transmit their model updates simultaneously and in an uncoded fashion, resulting in a much more efficient use of the available spectrum, but also a natural anonymity for the transmitting devices. The proposed approach further improves the performance of private wireless FL by reducing the amount of noise that must be injected.