Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hamed Haddadi

Imperial College London

Multimodal Federated Learning

Sep 10, 2021

Yuchen Zhao, Payam Barnaghi, Hamed Haddadi

Figure 1 for Multimodal Federated Learning

Figure 2 for Multimodal Federated Learning

Figure 3 for Multimodal Federated Learning

Figure 4 for Multimodal Federated Learning

Abstract:Federated learning is proposed as an alternative to centralized machine learning since its client-server structure provides better privacy protection and scalability in real-world applications. In many applications, such as smart homes with IoT devices, local data on clients are generated from different modalities such as sensory, visual, and audio data. Existing federated learning systems only work on local data from a single modality, which limits the scalability of the systems. In this paper, we propose a multimodal and semi-supervised federated learning framework that trains autoencoders to extract shared or correlated representations from different local data modalities on clients. In addition, we propose a multimodal FedAvg algorithm to aggregate local autoencoders trained on different data modalities. We use the learned global autoencoder for a downstream classification task with the help of auxiliary labelled data on the server. We empirically evaluate our framework on different modalities including sensory data, depth camera videos, and RGB camera videos. Our experimental results demonstrate that introducing data from multiple modalities into federated learning can improve its accuracy. In addition, we can use labelled data from only one modality for supervised learning on the server and apply the learned model to testing data from other modalities to achieve decent accuracy (e.g., approximately 70% as the best performance), especially when combining contributions from both unimodal clients and multimodal clients.

Via

Access Paper or Ask Questions

A Tandem Framework Balancing Privacy and Security for Voice User Interfaces

Jul 21, 2021

Ranya Aloufi, Hamed Haddadi, David Boyle

Figure 1 for A Tandem Framework Balancing Privacy and Security for Voice User Interfaces

Figure 2 for A Tandem Framework Balancing Privacy and Security for Voice User Interfaces

Figure 3 for A Tandem Framework Balancing Privacy and Security for Voice User Interfaces

Figure 4 for A Tandem Framework Balancing Privacy and Security for Voice User Interfaces

Abstract:Speech synthesis, voice cloning, and voice conversion techniques present severe privacy and security threats to users of voice user interfaces (VUIs). These techniques transform one or more elements of a speech signal, e.g., identity and emotion, while preserving linguistic information. Adversaries may use advanced transformation tools to trigger a spoofing attack using fraudulent biometrics for a legitimate speaker. Conversely, such techniques have been used to generate privacy-transformed speech by suppressing personally identifiable attributes in the voice signals, achieving anonymization. Prior works have studied the security and privacy vectors in parallel, and thus it raises alarm that if a benign user can achieve privacy by a transformation, it also means that a malicious user can break security by bypassing the anti-spoofing mechanism. In this paper, we take a step towards balancing two seemingly conflicting requirements: security and privacy. It remains unclear what the vulnerabilities in one domain imply for the other, and what dynamic interactions exist between them. A better understanding of these aspects is crucial for assessing and mitigating vulnerabilities inherent with VUIs and building effective defenses. In this paper,(i) we investigate the applicability of the current voice anonymization methods by deploying a tandem framework that jointly combines anti-spoofing and authentication models, and evaluate the performance of these methods;(ii) examining analytical and empirical evidence, we reveal a duality between the two mechanisms as they offer different ways to achieve the same objective, and we show that leveraging one vector significantly amplifies the effectiveness of the other;(iii) we demonstrate that to effectively defend from potential attacks against VUIs, it is necessary to investigate the attacks from multiple complementary perspectives(security and privacy).

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

Revisiting IoT Device Identification

Jul 16, 2021

Roman Kolcun, Diana Andreea Popescu, Vadim Safronov, Poonam Yadav, Anna Maria Mandalari, Richard Mortier, Hamed Haddadi

Figure 1 for Revisiting IoT Device Identification

Figure 2 for Revisiting IoT Device Identification

Figure 3 for Revisiting IoT Device Identification

Figure 4 for Revisiting IoT Device Identification

Abstract:Internet-of-Things (IoT) devices are known to be the source of many security problems, and as such, they would greatly benefit from automated management. This requires robustly identifying devices so that appropriate network security policies can be applied. We address this challenge by exploring how to accurately identify IoT devices based on their network behavior, while leveraging approaches previously proposed by other researchers. We compare the accuracy of four different previously proposed machine learning models (tree-based and neural network-based) for identifying IoT devices. We use packet trace data collected over a period of six months from a large IoT test-bed. We show that, while all models achieve high accuracy when evaluated on the same dataset as they were trained on, their accuracy degrades over time, when evaluated on data collected outside the training set. We show that on average the models' accuracy degrades after a couple of weeks by up to 40 percentage points (on average between 12 and 21 percentage points). We argue that, in order to keep the models' accuracy at a high level, these need to be continuously updated.

* To appear in TMA 2021 conference. 9 pages, 6 figures. arXiv admin note: text overlap with arXiv:2011.08605

Via

Access Paper or Ask Questions

Quantifying Information Leakage from Gradients

May 28, 2021

Fan Mo, Anastasia Borovykh, Mohammad Malekzadeh, Hamed Haddadi, Soteris Demetriou

Figure 1 for Quantifying Information Leakage from Gradients

Figure 2 for Quantifying Information Leakage from Gradients

Figure 3 for Quantifying Information Leakage from Gradients

Figure 4 for Quantifying Information Leakage from Gradients

Abstract:Sharing deep neural networks' gradients instead of training data could facilitate data privacy in collaborative learning. In practice however, gradients can disclose both private latent attributes and original data. Mathematical metrics are needed to quantify both original and latent information leakages from gradients computed over the training data. In this work, we first use an adaptation of the empirical $\mathcal{V}$-information to present an information-theoretic justification for the attack success rates in a layer-wise manner. We then move towards a deeper understanding of gradient leakages and propose more general and efficient metrics, using sensitivity and subspace distance to quantify the gradient changes w.r.t. original and latent information, respectively. Our empirical results, on six datasets and four models, reveal that gradients of the first layers contain the highest amount of original information, while the classifier/fully-connected layers placed after the feature extractor contain the highest latent information. Further, we show how training hyperparameters such as gradient aggregation can decrease information leakages. Our characterization provides a new understanding on gradient-based information leakages using the gradients' sensitivity w.r.t. changes in private information, and portends possible defenses such as layer-based protection or strong aggregation.

* 18 pages, 9 figures

Via

Access Paper or Ask Questions

Stronger Privacy for Federated Collaborative Filtering with Implicit Feedback

May 11, 2021

Lorenzo Minto, Moritz Haller, Hamed Haddadi, Benjamin Livshits

Figure 1 for Stronger Privacy for Federated Collaborative Filtering with Implicit Feedback

Figure 2 for Stronger Privacy for Federated Collaborative Filtering with Implicit Feedback

Figure 3 for Stronger Privacy for Federated Collaborative Filtering with Implicit Feedback

Figure 4 for Stronger Privacy for Federated Collaborative Filtering with Implicit Feedback

Abstract:Recommender systems are commonly trained on centrally collected user interaction data like views or clicks. This practice however raises serious privacy concerns regarding the recommender's collection and handling of potentially sensitive data. Several privacy-aware recommender systems have been proposed in recent literature, but comparatively little attention has been given to systems at the intersection of implicit feedback and privacy. To address this shortcoming, we propose a practical federated recommender system for implicit data under user-level local differential privacy (LDP). The privacy-utility trade-off is controlled by parameters $\epsilon$ and $k$, regulating the per-update privacy budget and the number of $\epsilon$-LDP gradient updates sent by each user respectively. To further protect the user's privacy, we introduce a proxy network to reduce the fingerprinting surface by anonymizing and shuffling the reports before forwarding them to the recommender. We empirically demonstrate the effectiveness of our framework on the MovieLens dataset, achieving up to Hit Ratio with K=10 (HR@10) 0.68 on 50k users with 5k items. Even on the full dataset, we show that it is possible to achieve reasonable utility with HR@10>0.5 without compromising user privacy.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

Apr 29, 2021

Fan Mo, Hamed Haddadi, Kleomenis Katevas, Eduard Marin, Diego Perino, Nicolas Kourtellis

Figure 1 for PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

Figure 2 for PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

Figure 3 for PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

Figure 4 for PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments

Abstract:We propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems to limit privacy leakages in federated learning. Leveraging the widespread presence of Trusted Execution Environments (TEEs) in high-end and mobile devices, we utilize TEEs on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. Challenged by the limited memory size of current TEEs, we leverage greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of our implementation shows that PPFL can significantly improve privacy while incurring small system overheads at the client-side. In particular, PPFL can successfully defend the trained model against data reconstruction, property inference, and membership inference attacks. Furthermore, it can achieve comparable model utility with fewer communication rounds (0.54x) and a similar amount of network traffic (1.002x) compared to the standard federated learning of a complete model. This is achieved while only introducing up to ~15% CPU time, ~18% memory usage, and ~21% energy consumption overhead in PPFL's client-side.

* 15 pages, 8 figures, accepted to MobiSys 2021

Via

Access Paper or Ask Questions

Configurable Privacy-Preserving Automatic Speech Recognition

Apr 01, 2021

Ranya Aloufi, Hamed Haddadi, David Boyle

Figure 1 for Configurable Privacy-Preserving Automatic Speech Recognition

Figure 2 for Configurable Privacy-Preserving Automatic Speech Recognition

Figure 3 for Configurable Privacy-Preserving Automatic Speech Recognition

Figure 4 for Configurable Privacy-Preserving Automatic Speech Recognition

Abstract:Voice assistive technologies have given rise to far-reaching privacy and security concerns. In this paper we investigate whether modular automatic speech recognition (ASR) can improve privacy in voice assistive systems by combining independently trained separation, recognition, and discretization modules to design configurable privacy-preserving ASR systems. We evaluate privacy concerns and the effects of applying various state-of-the-art techniques at each stage of the system, and report results using task-specific metrics (i.e. WER, ABX, and accuracy). We show that overlapping speech inputs to ASR systems present further privacy concerns, and how these may be mitigated using speech separation and optimization techniques. Our discretization module is shown to minimize paralinguistics privacy leakage from ASR acoustic models to levels commensurate with random guessing. We show that voice privacy can be configurable, and argue this presents new opportunities for privacy-preserving applications incorporating ASR.

* 5 pages, 1 figure

Via

Access Paper or Ask Questions

The Case for Retraining of ML Models for IoT Device Identification at the Edge

Nov 17, 2020

Roman Kolcun, Diana Andreea Popescu, Vadim Safronov, Poonam Yadav, Anna Maria Mandalari, Yiming Xie, Richard Mortier, Hamed Haddadi

Figure 1 for The Case for Retraining of ML Models for IoT Device Identification at the Edge

Figure 2 for The Case for Retraining of ML Models for IoT Device Identification at the Edge

Figure 3 for The Case for Retraining of ML Models for IoT Device Identification at the Edge

Figure 4 for The Case for Retraining of ML Models for IoT Device Identification at the Edge

Abstract:Internet-of-Things (IoT) devices are known to be the source of many security problems, and as such they would greatly benefit from automated management. This requires robustly identifying devices so that appropriate network security policies can be applied. We address this challenge by exploring how to accurately identify IoT devices based on their network behavior, using resources available at the edge of the network. In this paper, we compare the accuracy of five different machine learning models (tree-based and neural network-based) for identifying IoT devices by using packet trace data from a large IoT test-bed, showing that all models need to be updated over time to avoid significant degradation in accuracy. In order to effectively update the models, we find that it is necessary to use data gathered from the deployment environment, e.g., the household. We therefore evaluate our approach using hardware resources and data sources representative of those that would be available at the edge of the network, such as in an IoT deployment. We show that updating neural network-based models at the edge is feasible, as they require low computational and memory resources and their structure is amenable to being updated. Our results show that it is possible to achieve device identification and categorization with over 80% and 90% accuracy respectively at the edge.

* 13 pages, 8 figures, 4 tables

Via

Access Paper or Ask Questions

Semi-supervised Federated Learning for Activity Recognition

Nov 06, 2020

Yuchen Zhao, Hanyang Liu, Honglin Li, Payam Barnaghi, Hamed Haddadi

Figure 1 for Semi-supervised Federated Learning for Activity Recognition

Figure 2 for Semi-supervised Federated Learning for Activity Recognition

Figure 3 for Semi-supervised Federated Learning for Activity Recognition

Figure 4 for Semi-supervised Federated Learning for Activity Recognition

Abstract:The proliferation of IoT sensors and edge devices makes it possible to use deep learning models to recognise daily activities locally using in-home monitoring technologies. Recently, federated learning systems that use edge devices as clients to collect and utilise IoT sensory data for human activity recognition have been commonly used as a new way to combine local (individual-level) and global (group-level) models. This approach provides better scalability and generalisability and also offers higher privacy compared with the traditional centralised analysis and learning models. The assumption behind federated learning, however, relies on supervised learning on clients. This requires a large volume of labelled data, which is difficult to collect in uncontrolled IoT environments such as remote in-home monitoring. In this paper, we propose an activity recognition system that uses semi-supervised federated learning, wherein clients conduct unsupervised learning on autoencoders with unlabelled local data to learn general representations, and a cloud server conducts supervised learning on an activity classifier with labelled data. Our experimental results show that using autoencoders and a long short-term memory (LSTM) classifier, the accuracy of our proposed system is comparable to that of a supervised federated learning system. Meanwhile, we demonstrate that our system is not affected by the Non-IID distribution of local data, and can even achieve better accuracy than supervised federated learning on some datasets. Additionally, we show that our proposed system can reduce the number of needed labels in the system and the size of local models without losing much accuracy, and has shorter local activity recognition time than supervised federated learning.

Via

Access Paper or Ask Questions

Paralinguistic Privacy Protection at the Edge

Nov 04, 2020

Ranya Aloufi, Hamed Haddadi, David Boyle

Figure 1 for Paralinguistic Privacy Protection at the Edge

Figure 2 for Paralinguistic Privacy Protection at the Edge

Figure 3 for Paralinguistic Privacy Protection at the Edge

Figure 4 for Paralinguistic Privacy Protection at the Edge

Abstract:Voice user interfaces and digital assistants are rapidly entering our homes and becoming integrated with all our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As sensitive attributes like our identity, gender, indicators of mental health status, alongside moods, emotions and their temporal patterns, are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving on-device paralinguistic information filtering prior to transmitting voice data. In this paper we introduce EDGY, a new lightweight disentangled representation learning model that transforms and filters high-dimensional voice data to remove sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY's on-device performance, and explore optimization techniques, including model pruning and quantization, to enable private, accurate and efficient representation learning on resource-constrained devices. Our experimental results show that EDGY runs in tens of milliseconds with minimal performance penalties or accuracy losses in speech recognition using only a CPU and a single core ARM device without specialized hardware.

* 14 pages, 7 figures. arXiv admin note: text overlap with arXiv:2007.15064

Via

Access Paper or Ask Questions