Abstract:Backdoor injection attacks are a threat to machine learning models that are trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples, but need much information about the data points or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression and linear classification. For adversaries that utilize a direction that is unused by the benign data distribution for the poison sample, we show that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We also validate our theoretical results experimentally with realistic benchmark data sets.
Abstract:The adoption of machine learning solutions is rapidly increasing across all parts of society. Cloud service providers such as Amazon Web Services, Microsoft Azure and the Google Cloud Platform aggressively expand their Machine-Learning-as-a-Service offerings. While the widespread adoption of machine learning has huge potential for both research and industry, the large-scale evaluation of possibly sensitive data on untrusted platforms bears inherent data security and privacy risks. Since computation time is expensive, performance is a critical factor for machine learning. However, prevailing security measures proposed in the past years come with a significant performance overhead. We investigate the current state of protected distributed machine learning systems, focusing on deep convolutional neural networks. The most common and best-performing mixed MPC approaches are based on homomorphic encryption, secret sharing, and garbled circuits. They commonly suffer from communication overheads that grow linearly in the depth of the neural network. We present Dash, a fast and distributed private machine learning inference scheme. Dash is based purely on arithmetic garbled circuits. It requires only a single communication round per inference step, regardless of the depth of the neural network, and a very small constant communication volume. Dash thus significantly reduces performance requirements and scales better than previous approaches. In addition, we introduce the concept of LabelTensors. This allows us to efficiently use GPUs while using garbled circuits, which further reduces the runtime. Dash offers security against a malicious attacker and is up to 140 times faster than previous arithmetic garbling schemes.