Abstract:Today, machine learning is widely applied in sensitive, security-related, and financially lucrative applications. Model extraction attacks undermine current business models where a model owner sells model access, e.g., via MLaaS APIs. Additionally, stolen models can enable powerful white-box attacks, facilitating privacy attacks on sensitive training data, and model evasion. In this paper, we focus on Decision Trees (DT), which are widely deployed in practice. Existing black-box extraction attacks for DTs are either query-intensive, make strong assumptions about the DT structure, or rely on rich API information. To limit attacks to the black-box setting, CPU vendors introduced Trusted Execution Environments (TEE) that use hardware-mechanisms to isolate workloads from external parties, e.g., MLaaS providers. We introduce TrEEStealer, a high-fidelity extraction attack for stealing TEE-protected DTs. TrEEStealer exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. We use two primitives to acquire CFI: for AMD SEV, we follow previous work using the SEV-Step framework and performance counters. For Intel SGX, we reproduce prior findings on current Xeon 6 CPUs and construct a new primitive to efficiently extract the branch history of inference runs through the Branch-History-Register. We found corresponding vulnerabilities in three popular libraries: OpenCV, mlpack, and emlearn. We show that TrEEStealer achieves superior efficiency and extraction fidelity compared to prior attacks. Our work establishes a new state-of-the-art for DT extraction and confirms that TEEs fail to protect against control-flow leakage.




Abstract:With high-stakes machine learning applications increasingly moving to untrusted end-user or cloud environments, safeguarding pre-trained model parameters becomes essential for protecting intellectual property and user privacy. Recent advancements in hardware-isolated enclaves, notably Intel SGX, hold the promise to secure the internal state of machine learning applications even against compromised operating systems. However, we show that privileged software adversaries can exploit input-dependent memory access patterns in common neural network activation functions to extract secret weights and biases from an SGX enclave. Our attack leverages the SGX-Step framework to obtain a noise-free, instruction-granular page-access trace. In a case study of an 11-input regression network using the Tensorflow Microlite library, we demonstrate complete recovery of all first-layer weights and biases, as well as partial recovery of parameters from deeper layers under specific conditions. Our novel attack technique requires only 20 queries per input per weight to obtain all first-layer weights and biases with an average absolute error of less than 1%, improving over prior model stealing attacks. Additionally, a broader ecosystem analysis reveals the widespread use of activation functions with input-dependent memory access patterns in popular machine learning frameworks (either directly or via underlying math libraries). Our findings highlight the limitations of deploying confidential models in SGX enclaves and emphasise the need for stricter side-channel validation of machine learning implementations, akin to the vetting efforts applied to secure cryptographic libraries.