Alert button
Picture for Ilias Leontiadis

Ilias Leontiadis

Alert button

EXACT: Extensive Attack for Split Learning

May 25, 2023
Xinchi Qiu, Ilias Leontiadis, Luca Melis, Alex Sablayrolles, Pierre Stock

Figure 1 for EXACT: Extensive Attack for Split Learning
Figure 2 for EXACT: Extensive Attack for Split Learning
Figure 3 for EXACT: Extensive Attack for Split Learning
Figure 4 for EXACT: Extensive Attack for Split Learning

Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device Machine Learning allows us to completely avoid sharing information with a third-party server during inference. However, on-device models are typically less accurate when compared to the server counterparts due to the fact that (1) they typically only rely on a small set of on-device features and (2) they need to be small enough to run efficiently on end-user devices. Split Learning (SL) is a promising approach that can overcome these limitations. In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server-side and a smaller part executing on-device, aiming to incorporate the private features. However, end-to-end training of such models requires exchanging gradients at the cut layer, which might encode private features or labels. In this paper, we provide insights into potential privacy risks associated with SL and introduce a novel attack method, EXACT, to reconstruct private information. Furthermore, we also investigate the effectiveness of various mitigation strategies. Our results indicate that the gradients significantly improve the attacker's effectiveness in all three datasets reaching almost 100% reconstruction accuracy for some features. However, a small amount of differential privacy (DP) is quite effective in mitigating this risk without causing significant training degradation.

* 10 pages 
Viaarxiv icon

FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning

Jun 07, 2022
Meisam Hejazinia, Dzmitry Huba, Ilias Leontiadis, Kiwan Maeng, Mani Malek, Luca Melis, Ilya Mironov, Milad Nasr, Kaikai Wang, Carole-Jean Wu

Figure 1 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning
Figure 2 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning
Figure 3 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning
Figure 4 for FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning

Federated learning (FL) has emerged as an effective approach to address consumer privacy needs. FL has been successfully applied to certain machine learning tasks, such as training smart keyboard models and keyword spotting. Despite FL's initial success, many important deep learning use cases, such as ranking and recommendation tasks, have been limited from on-device learning. One of the key challenges faced by practical FL adoption for DL-based ranking and recommendation is the prohibitive resource requirements that cannot be satisfied by modern mobile systems. We propose Federated Ensemble Learning (FEL) as a solution to tackle the large memory requirement of deep learning ranking and recommendation tasks. FEL enables large-scale ranking and recommendation model training on-device by simultaneously training multiple model versions on disjoint clusters of client devices. FEL integrates the trained sub-models via an over-arch layer into an ensemble model that is hosted on the server. Our experiments demonstrate that FEL leads to 0.43-2.31% model quality improvement over traditional on-device federated learning - a significant improvement for ranking and recommendation system use cases.

Viaarxiv icon

Smart at what cost? Characterising Mobile Deep Neural Networks in the wild

Sep 28, 2021
Mario Almeida, Stefanos Laskaridis, Abhinav Mehrotra, Lukasz Dudziak, Ilias Leontiadis, Nicholas D. Lane

Figure 1 for Smart at what cost? Characterising Mobile Deep Neural Networks in the wild
Figure 2 for Smart at what cost? Characterising Mobile Deep Neural Networks in the wild
Figure 3 for Smart at what cost? Characterising Mobile Deep Neural Networks in the wild
Figure 4 for Smart at what cost? Characterising Mobile Deep Neural Networks in the wild

With smartphones' omnipresence in people's pockets, Machine Learning (ML) on mobile is gaining traction as devices become more powerful. With applications ranging from visual filters to voice assistants, intelligence on mobile comes in many forms and facets. However, Deep Neural Network (DNN) inference remains a compute intensive workload, with devices struggling to support intelligence at the cost of responsiveness.On the one hand, there is significant research on reducing model runtime requirements and supporting deployment on embedded devices. On the other hand, the strive to maximise the accuracy of a task is supported by deeper and wider neural networks, making mobile deployment of state-of-the-art DNNs a moving target. In this paper, we perform the first holistic study of DNN usage in the wild in an attempt to track deployed models and match how these run on widely deployed devices. To this end, we analyse over 16k of the most popular apps in the Google Play Store to characterise their DNN usage and performance across devices of different capabilities, both across tiers and generations. Simultaneously, we measure the models' energy footprint, as a core cost dimension of any mobile deployment. To streamline the process, we have developed gaugeNN, a tool that automates the deployment, measurement and analysis of DNNs on devices, with support for different frameworks and platforms. Results from our experience study paint the landscape of deep learning deployments on smartphones and indicate their popularity across app developers. Furthermore, our study shows the gap between bespoke techniques and real-world deployments and the need for optimised deployment of deep learning models in a highly dynamic and heterogeneous ecosystem.

* Accepted at the ACM Internet Measurement Conference (IMC), 2021 
Viaarxiv icon

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Jun 21, 2021
Stylianos I. Venieris, Ioannis Panopoulos, Ilias Leontiadis, Iakovos S. Venieris

Figure 1 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
Figure 2 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
Figure 3 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
Figure 4 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

The unprecedented performance of deep neural networks (DNNs) has led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. Nevertheless, deploying such AI models across commodity devices faces significant challenges: large computational cost, multiple performance objectives, hardware heterogeneity and a common need for high accuracy, together pose critical problems to the deployment of DNNs across the various embedded and mobile devices in the wild. As such, we have yet to witness the mainstream usage of state-of-the-art deep learning algorithms across consumer devices. In this paper, we provide preliminary answers to this potentially game-changing question by presenting an array of design techniques for efficient AI systems. We start by examining the major roadblocks when targeting both programmable processors and custom accelerators. Then, we present diverse methods for achieving real-time performance following a cross-stack approach. These span model-, system- and hardware-level techniques, and their combination. Our findings provide illustrative examples of AI systems that do not overburden mobile hardware, while also indicating how they can improve inference accuracy. Moreover, we showcase how custom ASIC- and FPGA-based accelerators can be an enabling factor for next-generation AI applications, such as multi-DNN systems. Collectively, these results highlight the critical need for further exploration as to how the various cross-stack solutions can be best combined in order to bring the latest advances in deep learning close to users, in a robust and efficient manner.

* Invited paper at the 32nd IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), 2021 
Viaarxiv icon

DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device

Apr 20, 2021
Mario Almeida, Stefanos Laskaridis, Stylianos I. Venieris, Ilias Leontiadis, Nicholas D. Lane

Figure 1 for DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device
Figure 2 for DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device
Figure 3 for DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device
Figure 4 for DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device

Recently, there has been an explosive growth of mobile and embedded applications using convolutional neural networks(CNNs). To alleviate their excessive computational demands, developers have traditionally resorted to cloud offloading, inducing high infrastructure costs and a strong dependence on networking conditions. On the other end, the emergence of powerful SoCs is gradually enabling on-device execution. Nonetheless, low- and mid-tier platforms still struggle to run state-of-the-art CNNs sufficiently. In this paper, we present DynO, a distributed inference framework that combines the best of both worlds to address several challenges, such as device heterogeneity, varying bandwidth and multi-objective requirements. Key components that enable this are its novel CNN-specific data packing method, which exploits the variability of precision needs in different parts of the CNN when onloading computation, and its novel scheduler that jointly tunes the partition point and transferred data precision at run time to adapt inference to its execution environment. Quantitative evaluation shows that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution and up to 7.9x over competing CNN offloading systems, with up to 60x less data transferred.

* Under review 
Viaarxiv icon

FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout

Mar 01, 2021
Samuel Horvath, Stefanos Laskaridis, Mario Almeida, Ilias Leontiadis, Stylianos I. Venieris, Nicholas D. Lane

Figure 1 for FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout
Figure 2 for FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout
Figure 3 for FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout
Figure 4 for FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout

Federated Learning (FL) has been gaining significant traction across different ML tasks, ranging from vision to keyboard predictions. In large-scale deployments, client heterogeneity is a fact, and constitutes a primary problem for fairness, training performance and accuracy. Although significant efforts have been made into tackling statistical data heterogeneity, the diversity in the processing capabilities and network bandwidth of clients, termed as system heterogeneity, has remained largely unexplored. Current solutions either disregard a large portion of available devices or set a uniform limit on the model's capacity, restricted by the least capable participants. In this work, we introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in Neural Networks and enables the extraction of lower footprint submodels without the need of retraining. We further show that for linear maps our Ordered Dropout is equivalent to SVD. We employ this technique, along with a self-distillation methodology, in the realm of FL in a framework called FjORD. FjORD alleviates the problem of client system heterogeneity by tailoring the model width to the client's capabilities. Extensive evaluation on both CNNs and RNNs across diverse modalities shows that FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.

Viaarxiv icon

It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation

Feb 02, 2021
Ilias Leontiadis, Stefanos Laskaridis, Stylianos I. Venieris, Nicholas D. Lane

Figure 1 for It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation
Figure 2 for It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation
Figure 3 for It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation
Figure 4 for It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation

On-device machine learning is becoming a reality thanks to the availability of powerful hardware and model compression techniques. Typically, these models are pretrained on large GPU clusters and have enough parameters to generalise across a wide variety of inputs. In this work, we observe that a much smaller, personalised model can be employed to fit a specific scenario, resulting in both higher accuracy and faster execution. Nevertheless, on-device training is extremely challenging, imposing excessive computational and memory requirements even for flagship smartphones. At the same time, on-device data availability might be limited and samples are most frequently unlabelled. To this end, we introduce PersEPhonEE, a framework that attaches early exits on the model and personalises them on-device. These allow the model to progressively bypass a larger part of the computation as more personalised data become available. Moreover, we introduce an efficient on-device algorithm that trains the early exits in a semi-supervised manner at a fraction of the whole network's personalisation time. Results show that PersEPhonEE boosts accuracy by up to 15.9% while dropping the training cost by up to 2.2x and inference latency by 2.2-3.2x on average for the same accuracy, depending on the availability of labels on-device.

* Accepted at the 22nd International Workshop on Mobile Computing Systems and Applications (HotMobile), 2021 
Viaarxiv icon

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Aug 24, 2020
Stefanos Laskaridis, Stylianos I. Venieris, Mario Almeida, Ilias Leontiadis, Nicholas D. Lane

Figure 1 for SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Figure 2 for SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Figure 3 for SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Figure 4 for SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.

* Accepted at the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 2020 
Viaarxiv icon