Alert button
Picture for Junyuan Xie

Junyuan Xie

Alert button

DPAUC: Differentially Private AUC Computation in Federated Learning

Aug 25, 2022
Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di Wu, Chong Wang

Figure 1 for DPAUC: Differentially Private AUC Computation in Federated Learning
Figure 2 for DPAUC: Differentially Private AUC Computation in Federated Learning
Figure 3 for DPAUC: Differentially Private AUC Computation in Federated Learning
Figure 4 for DPAUC: Differentially Private AUC Computation in Federated Learning

Federated learning (FL) has gained significant attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple participants. The prior work on FL has mostly studied how to protect label privacy during model training. However, model evaluation in FL might also lead to potential leakage of private label information. In this work, we propose an evaluation algorithm that can accurately compute the widely used AUC (area under the curve) metric when using the label differential privacy (DP) in FL. Through extensive experiments, we show our algorithms can compute accurate AUCs compared to the ground truth.

Viaarxiv icon

Differentially Private AUC Computation in Vertical Federated Learning

May 24, 2022
Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di Wu, Chong Wang

Figure 1 for Differentially Private AUC Computation in Vertical Federated Learning
Figure 2 for Differentially Private AUC Computation in Vertical Federated Learning
Figure 3 for Differentially Private AUC Computation in Vertical Federated Learning
Figure 4 for Differentially Private AUC Computation in Vertical Federated Learning

Federated learning has gained great attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple parties. As a sub-category, vertical federated learning (vFL) focuses on the scenario where features and labels are split into different parties. The prior work on vFL has mostly studied how to protect label privacy during model training. However, model evaluation in vFL might also lead to potential leakage of private label information. One mitigation strategy is to apply label differential privacy (DP) but it gives bad estimations of the true (non-private) metrics. In this work, we propose two evaluation algorithms that can more accurately compute the widely used AUC (area under curve) metric when using label DP in vFL. Through extensive experiments, we show our algorithms can achieve more accurate AUCs compared to the baselines.

Viaarxiv icon

Differentially Private Label Protection in Split Learning

Mar 04, 2022
Xin Yang, Jiankai Sun, Yuanshun Yao, Junyuan Xie, Chong Wang

Figure 1 for Differentially Private Label Protection in Split Learning
Figure 2 for Differentially Private Label Protection in Split Learning
Figure 3 for Differentially Private Label Protection in Split Learning
Figure 4 for Differentially Private Label Protection in Split Learning

Split learning is a distributed training framework that allows multiple parties to jointly train a machine learning model over vertically partitioned data (partitioned by attributes). The idea is that only intermediate computation results, rather than private features and labels, are shared between parties so that raw training data remains private. Nevertheless, recent works showed that the plaintext implementation of split learning suffers from severe privacy risks that a semi-honest adversary can easily reconstruct labels. In this work, we propose \textsf{TPSL} (Transcript Private Split Learning), a generic gradient perturbation based split learning framework that provides provable differential privacy guarantee. Differential privacy is enforced on not only the model weights, but also the communicated messages in the distributed computation setting. Our experiments on large-scale real-world datasets demonstrate the robustness and effectiveness of \textsf{TPSL} against label leakage attacks. We also find that \textsf{TPSL} have a better utility-privacy trade-off than baselines.

Viaarxiv icon

Defending against Reconstruction Attack in Vertical Federated Learning

Jul 21, 2021
Jiankai Sun, Yuanshun Yao, Weihao Gao, Junyuan Xie, Chong Wang

Figure 1 for Defending against Reconstruction Attack in Vertical Federated Learning
Figure 2 for Defending against Reconstruction Attack in Vertical Federated Learning
Figure 3 for Defending against Reconstruction Attack in Vertical Federated Learning

Recently researchers have studied input leakage problems in Federated Learning (FL) where a malicious party can reconstruct sensitive training inputs provided by users from shared gradient. It raises concerns about FL since input leakage contradicts the privacy-preserving intention of using FL. Despite a relatively rich literature on attacks and defenses of input reconstruction in Horizontal FL, input leakage and protection in vertical FL starts to draw researcher's attention recently. In this paper, we study how to defend against input leakage attacks in Vertical FL. We design an adversarial training-based framework that contains three modules: adversarial reconstruction, noise regularization, and distance correlation minimization. Those modules can not only be employed individually but also applied together since they are independent to each other. Through extensive experiments on a large-scale industrial online advertising dataset, we show our framework is effective in protecting input privacy while retaining the model utility.

* Accepted to International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21) 
Viaarxiv icon

Vertical Federated Learning without Revealing Intersection Membership

Jun 10, 2021
Jiankai Sun, Xin Yang, Yuanshun Yao, Aonan Zhang, Weihao Gao, Junyuan Xie, Chong Wang

Figure 1 for Vertical Federated Learning without Revealing Intersection Membership
Figure 2 for Vertical Federated Learning without Revealing Intersection Membership
Figure 3 for Vertical Federated Learning without Revealing Intersection Membership
Figure 4 for Vertical Federated Learning without Revealing Intersection Membership

Vertical Federated Learning (vFL) allows multiple parties that own different attributes (e.g. features and labels) of the same data entity (e.g. a person) to jointly train a model. To prepare the training data, vFL needs to identify the common data entities shared by all parties. It is usually achieved by Private Set Intersection (PSI) which identifies the intersection of training samples from all parties by using personal identifiable information (e.g. email) as sample IDs to align data instances. As a result, PSI would make sample IDs of the intersection visible to all parties, and therefore each party can know that the data entities shown in the intersection also appear in the other parties, i.e. intersection membership. However, in many real-world privacy-sensitive organizations, e.g. banks and hospitals, revealing membership of their data entities is prohibited. In this paper, we propose a vFL framework based on Private Set Union (PSU) that allows each party to keep sensitive membership information to itself. Instead of identifying the intersection of all training samples, our PSU protocol generates the union of samples as training instances. In addition, we propose strategies to generate synthetic features and labels to handle samples that belong to the union but not the intersection. Through extensive experiments on two real-world datasets, we show our framework can protect the privacy of the intersection membership while maintaining the model utility.

Viaarxiv icon

Label Leakage and Protection in Two-party Split Learning

Feb 17, 2021
Oscar Li, Jiankai Sun, Xin Yang, Weihao Gao, Hongyi Zhang, Junyuan Xie, Virginia Smith, Chong Wang

Figure 1 for Label Leakage and Protection in Two-party Split Learning
Figure 2 for Label Leakage and Protection in Two-party Split Learning
Figure 3 for Label Leakage and Protection in Two-party Split Learning
Figure 4 for Label Leakage and Protection in Two-party Split Learning

In vertical federated learning, two-party split learning has become an important topic and has found many applications in real business scenarios. However, how to prevent the participants' ground-truth labels from possible leakage is not well studied. In this paper, we consider answering this question in an imbalanced binary classification setting, a common case in online business applications. We first show that, norm attack, a simple method that uses the norm of the communicated gradients between the parties, can largely reveal the ground-truth labels from the participants. We then discuss several protection techniques to mitigate this issue. Among them, we have designed a principled approach that directly maximizes the worst-case error of label detection. This is proved to be more effective in countering norm attack and beyond. We experimentally demonstrate the competitiveness of our proposed method compared to several other baselines.

Viaarxiv icon

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Jul 09, 2019
Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng

Figure 1 for GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. Benefiting from open source under the Apache 2.0 license, GluonCV and GluonNLP have attracted 100 contributors worldwide on GitHub. Models of GluonCV and GluonNLP have been downloaded for more than 1.6 million times in fewer than 10 months.

Viaarxiv icon

Bag of Freebies for Training Object Detection Neural Networks

Apr 12, 2019
Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

Figure 1 for Bag of Freebies for Training Object Detection Neural Networks
Figure 2 for Bag of Freebies for Training Object Detection Neural Networks
Figure 3 for Bag of Freebies for Training Object Detection Neural Networks
Figure 4 for Bag of Freebies for Training Object Detection Neural Networks

Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}. Object detection models, however, have more complex neural network structures and optimization targets. The training strategies and pipelines dramatically vary among different models. In this works, we explore training tweaks that apply to various models including Faster R-CNN and YOLOv3. These tweaks do not change the model architectures, therefore, the inference costs remain the same. Our empirical results demonstrate that, however, these freebies can improve up to 5% absolute precision compared to state-of-the-art baselines.

Viaarxiv icon

Bag of Tricks for Image Classification with Convolutional Neural Networks

Dec 05, 2018
Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

Figure 1 for Bag of Tricks for Image Classification with Convolutional Neural Networks
Figure 2 for Bag of Tricks for Image Classification with Convolutional Neural Networks
Figure 3 for Bag of Tricks for Image Classification with Convolutional Neural Networks
Figure 4 for Bag of Tricks for Image Classification with Convolutional Neural Networks

Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. We will show that, by combining these refinements together, we are able to improve various CNN models significantly. For example, we raise ResNet-50's top-1 validation accuracy from 75.3% to 79.29% on ImageNet. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation.

* 10 pages, 9 tables, 4 figures 
Viaarxiv icon