More and more orgainizations and institutions make efforts on using external data to improve the performance of AI services. To address the data privacy and security concerns, federated learning has attracted increasing attention from both academia and industry to securely construct AI models across multiple isolated data providers. In this paper, we studied the efficiency problem of adapting widely used XGBoost model in real-world applications to vertical federated learning setting. State-of-the-art vertical federated XGBoost frameworks requires large number of encryption operations and ciphertext transmissions, which makes the model training much less efficient than training XGBoost models locally. To bridge this gap, we proposed a novel batch homomorphic encryption method to cut the cost of encryption-related computation and transmission in nearly half. This is achieved by encoding the first-order derivative and the second-order derivative into a single number for encryption, ciphertext transmission, and homomorphic addition operations. The sum of multiple first-order derivatives and second-order derivatives can be simultaneously decoded from the sum of encoded values. We are motivated by the batch idea in the work of BatchCrypt for horizontal federated learning, and design a novel batch method to address the limitations of allowing quite few number of negative numbers. The encode procedure of the proposed batch method consists of four steps, including shifting, truncating, quantizing and batching, while the decoding procedure consists of de-quantization and shifting back. The advantages of our method are demonstrated through theoretical analysis and extensive numerical experiments.
The increasing concerns about data privacy and security drives the emergence of a new field of studying privacy-preserving machine learning from isolated data sources, i.e., \textit{federated learning}. Vertical federated learning, where different parties hold different features for common users, has a great potential of driving a more variety of business cooperation among enterprises in different fields. Decision tree models especially decision tree ensembles are a class of widely applied powerful machine learning models with high interpretability and modeling efficiency. However, the interpretability are compromised in these works such as SecureBoost since the feature names are not exposed to avoid possible data breaches due to the unprotected decision path. In this paper, we shall propose Fed-EINI, an efficient and interpretable inference framework for federated decision tree models with only one round of multi-party communication. We shall compute the candidate sets of leaf nodes based on the local data at each party in parallel, followed by securely computing the weight of the only leaf node in the intersection of the candidate sets. We propose to protect the decision path by the efficient additively homomorphic encryption method, which allows the disclosure of feature names and thus makes the federated decision trees interpretable. The advantages of Fed-EINI will be demonstrated through theoretical analysis and extensive numerical results. Experiments show that the inference efficiency is improved by over $50\%$ in average.