Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuanshun Yao

Label Inference Attack against Split Learning under Regression Setting

Jan 18, 2023
Shangyu Xie, Xin Yang, Yuanshun Yao, Tianyi Liu, Taiqing Wang, Jiankai Sun

Figure 1 for Label Inference Attack against Split Learning under Regression Setting

Figure 2 for Label Inference Attack against Split Learning under Regression Setting

Figure 3 for Label Inference Attack against Split Learning under Regression Setting

Figure 4 for Label Inference Attack against Split Learning under Regression Setting

As a crucial building block in vertical Federated Learning (vFL), Split Learning (SL) has demonstrated its practice in the two-party model training collaboration, where one party holds the features of data samples and another party holds the corresponding labels. Such method is claimed to be private considering the shared information is only the embedding vectors and gradients instead of private raw data and labels. However, some recent works have shown that the private labels could be leaked by the gradients. These existing attack only works under the classification setting where the private labels are discrete. In this work, we step further to study the leakage in the scenario of the regression model, where the private labels are continuous numbers (instead of discrete labels in classification). This makes previous attacks harder to infer the continuous labels due to the unbounded output range. To address the limitation, we propose a novel learning-based attack that integrates gradient information and extra learning regularization objectives in aspects of model training properties, which can infer the labels under regression settings effectively. The comprehensive experiments on various datasets and models have demonstrated the effectiveness of our proposed attack. We hope our work can pave the way for future analyses that make the vFL framework more secure.

* 9 pages

Via

Access Paper or Ask Questions

Learning to Counterfactually Explain Recommendations

Nov 17, 2022
Yuanshun Yao, Chong Wang, Hang Li

Figure 1 for Learning to Counterfactually Explain Recommendations

Figure 2 for Learning to Counterfactually Explain Recommendations

Figure 3 for Learning to Counterfactually Explain Recommendations

Figure 4 for Learning to Counterfactually Explain Recommendations

Recommender system practitioners are facing increasing pressure to explain recommendations. We explore how to explain recommendations using counterfactual logic, i.e. "Had you not interacted with the following items before, it is likely we would not recommend this item." Compared to traditional explanation logic, counterfactual explanations are easier to understand and more technically verifiable. The major challenge of generating such explanations is the computational cost because it requires repeatedly retraining the models to obtain the effect on a recommendation caused by removing user (interaction) history. We propose a learning-based framework to generate counterfactual explanations. The key idea is to train a surrogate model to learn the effect of removing a subset of user history on the recommendation. To this end, we first artificially simulate the counterfactual outcomes on the recommendation after deleting subsets of history. Then we train surrogate models to learn the mapping between a history deletion and the change in the recommendation caused by the deletion. Finally, to generate an explanation, we find the history subset predicted by the surrogate model that is most likely to remove the recommendation. Through offline experiments and online user studies, we show our method, compared to baselines, can generate explanations that are more counterfactually valid and more satisfactory considered by users.

Via

Access Paper or Ask Questions

Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models

Oct 06, 2022
Zhaowei Zhu, Yuanshun Yao, Jiankai Sun, Yang Liu, Hang Li

Figure 1 for Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models

Figure 2 for Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models

Figure 3 for Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models

Figure 4 for Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models

Although the volume of literature and public attention on machine learning fairness has been growing significantly, in practice some tasks as basic as measuring fairness, which is the first step in studying and promoting fairness, can be challenging. This is because sensitive attributes are often unavailable due to privacy regulations. The straightforward solution is to use auxiliary models to predict the missing sensitive attributes. However, our theoretical analyses show that the estimation error of the directly measured fairness metrics is proportional to the error rates of auxiliary models' predictions. Existing works that attempt to reduce the estimation error often require strong assumptions, e.g. access to the ground-truth sensitive attributes or some form of conditional independence. In this paper, we drop those assumptions and propose a framework that uses only off-the-shelf auxiliary models. The main challenge is how to reduce the negative impact of imperfectly predicted sensitive attributes on the fairness metrics without knowing the ground-truth sensitive attributes. Inspired by the noisy label learning literature, we first derive a closed-form relationship between the directly measured fairness metrics and their corresponding ground-truth metrics. And then we estimate some key statistics (most importantly transition matrix in the noisy label literature), which we use, together with the derived relationship, to calibrate the fairness metrics. In addition, we theoretically prove the upper bound of the estimation error in our calibrated metrics and show our method can substantially decrease the estimation error especially when auxiliary models are inaccurate or the target model is highly biased. Experiments on COMPAS and CelebA validate our theoretical analyses and show our method can measure fairness significantly more accurately than baselines under favorable circumstances.

Via

Access Paper or Ask Questions

DPAUC: Differentially Private AUC Computation in Federated Learning

Aug 25, 2022
Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di Wu, Chong Wang

Figure 1 for DPAUC: Differentially Private AUC Computation in Federated Learning

Figure 2 for DPAUC: Differentially Private AUC Computation in Federated Learning

Figure 3 for DPAUC: Differentially Private AUC Computation in Federated Learning

Figure 4 for DPAUC: Differentially Private AUC Computation in Federated Learning

Federated learning (FL) has gained significant attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple participants. The prior work on FL has mostly studied how to protect label privacy during model training. However, model evaluation in FL might also lead to potential leakage of private label information. In this work, we propose an evaluation algorithm that can accurately compute the widely used AUC (area under the curve) metric when using the label differential privacy (DP) in FL. Through extensive experiments, we show our algorithms can compute accurate AUCs compared to the ground truth.

Via

Access Paper or Ask Questions

Differentially Private Multi-Party Data Release for Linear Regression

Jun 18, 2022
Ruihan Wu, Xin Yang, Yuanshun Yao, Jiankai Sun, Tianyi Liu, Kilian Q. Weinberger, Chong Wang

Figure 1 for Differentially Private Multi-Party Data Release for Linear Regression

Figure 2 for Differentially Private Multi-Party Data Release for Linear Regression

Figure 3 for Differentially Private Multi-Party Data Release for Linear Regression

Figure 4 for Differentially Private Multi-Party Data Release for Linear Regression

Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects. However the majority of prior work has focused on scenarios where a single party owns all the data. In this paper we focus on the multi-party setting, where different stakeholders own disjoint sets of attributes belonging to the same group of data subjects. Within the context of linear regression that allow all parties to train models on the complete data without the ability to infer private attributes or identities of individuals, we start with directly applying Gaussian mechanism and show it has the small eigenvalue problem. We further propose our novel method and prove it asymptotically converges to the optimal (non-private) solutions with increasing dataset size. We substantiate the theoretical results through experiments on both artificial and real-world datasets.

* UAI 2022

Via

Access Paper or Ask Questions

Differentially Private AUC Computation in Vertical Federated Learning

May 24, 2022
Jiankai Sun, Xin Yang, Yuanshun Yao, Junyuan Xie, Di Wu, Chong Wang

Figure 1 for Differentially Private AUC Computation in Vertical Federated Learning

Figure 2 for Differentially Private AUC Computation in Vertical Federated Learning

Figure 3 for Differentially Private AUC Computation in Vertical Federated Learning

Figure 4 for Differentially Private AUC Computation in Vertical Federated Learning

Federated learning has gained great attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple parties. As a sub-category, vertical federated learning (vFL) focuses on the scenario where features and labels are split into different parties. The prior work on vFL has mostly studied how to protect label privacy during model training. However, model evaluation in vFL might also lead to potential leakage of private label information. One mitigation strategy is to apply label differential privacy (DP) but it gives bad estimations of the true (non-private) metrics. In this work, we propose two evaluation algorithms that can more accurately compute the widely used AUC (area under curve) metric when using label DP in vFL. Through extensive experiments, we show our algorithms can achieve more accurate AUCs compared to the baselines.

Via

Access Paper or Ask Questions

Label Leakage and Protection from Forward Embedding in Vertical Federated Learning

Mar 04, 2022
Jiankai Sun, Xin Yang, Yuanshun Yao, Chong Wang

Figure 1 for Label Leakage and Protection from Forward Embedding in Vertical Federated Learning

Figure 2 for Label Leakage and Protection from Forward Embedding in Vertical Federated Learning

Figure 3 for Label Leakage and Protection from Forward Embedding in Vertical Federated Learning

Figure 4 for Label Leakage and Protection from Forward Embedding in Vertical Federated Learning

Vertical federated learning (vFL) has gained much attention and been deployed to solve machine learning problems with data privacy concerns in recent years. However, some recent work demonstrated that vFL is vulnerable to privacy leakage even though only the forward intermediate embedding (rather than raw features) and backpropagated gradients (rather than raw labels) are communicated between the involved participants. As the raw labels often contain highly sensitive information, some recent work has been proposed to prevent the label leakage from the backpropagated gradients effectively in vFL. However, these work only identified and defended the threat of label leakage from the backpropagated gradients. None of these work has paid attention to the problem of label leakage from the intermediate embedding. In this paper, we propose a practical label inference method which can steal private labels effectively from the shared intermediate embedding even though some existing protection methods such as label differential privacy and gradients perturbation are applied. The effectiveness of the label attack is inseparable from the correlation between the intermediate embedding and corresponding private labels. To mitigate the issue of label leakage from the forward embedding, we add an additional optimization goal at the label party to limit the label stealing ability of the adversary by minimizing the distance correlation between the intermediate embedding and corresponding private labels. We conducted massive experiments to demonstrate the effectiveness of our proposed protection methods.

Via

Access Paper or Ask Questions

Differentially Private Label Protection in Split Learning

Mar 04, 2022
Xin Yang, Jiankai Sun, Yuanshun Yao, Junyuan Xie, Chong Wang

Figure 1 for Differentially Private Label Protection in Split Learning

Figure 2 for Differentially Private Label Protection in Split Learning

Figure 3 for Differentially Private Label Protection in Split Learning

Figure 4 for Differentially Private Label Protection in Split Learning

Split learning is a distributed training framework that allows multiple parties to jointly train a machine learning model over vertically partitioned data (partitioned by attributes). The idea is that only intermediate computation results, rather than private features and labels, are shared between parties so that raw training data remains private. Nevertheless, recent works showed that the plaintext implementation of split learning suffers from severe privacy risks that a semi-honest adversary can easily reconstruct labels. In this work, we propose \textsf{TPSL} (Transcript Private Split Learning), a generic gradient perturbation based split learning framework that provides provable differential privacy guarantee. Differential privacy is enforced on not only the model weights, but also the communicated messages in the distributed computation setting. Our experiments on large-scale real-world datasets demonstrate the robustness and effectiveness of \textsf{TPSL} against label leakage attacks. We also find that \textsf{TPSL} have a better utility-privacy trade-off than baselines.

Via

Access Paper or Ask Questions

Counterfactually Evaluating Explanations in Recommender Systems

Mar 02, 2022
Yuanshun Yao, Chong Wang, Hang Li

Figure 1 for Counterfactually Evaluating Explanations in Recommender Systems

Figure 2 for Counterfactually Evaluating Explanations in Recommender Systems

Figure 3 for Counterfactually Evaluating Explanations in Recommender Systems

Figure 4 for Counterfactually Evaluating Explanations in Recommender Systems

Modern recommender systems face an increasing need to explain their recommendations. Despite considerable progress in this area, evaluating the quality of explanations remains a significant challenge for researchers and practitioners. Prior work mainly conducts human study to evaluate explanation quality, which is usually expensive, time-consuming, and prone to human bias. In this paper, we propose an offline evaluation method that can be computed without human involvement. To evaluate an explanation, our method quantifies its counterfactual impact on the recommendation. To validate the effectiveness of our method, we carry out an online user study. We show that, compared to conventional methods, our method can produce evaluation scores more correlated with the real human judgments, and therefore can serve as a better proxy for human evaluation. In addition, we show that explanations with high evaluation scores are considered better by humans. Our findings highlight the promising direction of using the counterfactual approach as one possible way to evaluate recommendation explanations.

Via

Access Paper or Ask Questions

Defending against Reconstruction Attack in Vertical Federated Learning

Jul 21, 2021
Jiankai Sun, Yuanshun Yao, Weihao Gao, Junyuan Xie, Chong Wang

Figure 1 for Defending against Reconstruction Attack in Vertical Federated Learning

Figure 2 for Defending against Reconstruction Attack in Vertical Federated Learning

Figure 3 for Defending against Reconstruction Attack in Vertical Federated Learning

Recently researchers have studied input leakage problems in Federated Learning (FL) where a malicious party can reconstruct sensitive training inputs provided by users from shared gradient. It raises concerns about FL since input leakage contradicts the privacy-preserving intention of using FL. Despite a relatively rich literature on attacks and defenses of input reconstruction in Horizontal FL, input leakage and protection in vertical FL starts to draw researcher's attention recently. In this paper, we study how to defend against input leakage attacks in Vertical FL. We design an adversarial training-based framework that contains three modules: adversarial reconstruction, noise regularization, and distance correlation minimization. Those modules can not only be employed individually but also applied together since they are independent to each other. Through extensive experiments on a large-scale industrial online advertising dataset, we show our framework is effective in protecting input privacy while retaining the model utility.

* Accepted to International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21)

Via

Access Paper or Ask Questions