Alert button
Picture for Ben Nassi

Ben Nassi

Alert button

The Adversarial Implications of Variable-Time Inference

Sep 05, 2023
Dudi Biton, Aditi Misra, Efrat Levy, Jaidip Kotak, Ron Bitton, Roei Schuster, Nicolas Papernot, Yuval Elovici, Ben Nassi

Machine learning (ML) models are known to be vulnerable to a number of attacks that target the integrity of their predictions or the privacy of their training data. To carry out these attacks, a black-box adversary must typically possess the ability to query the model and observe its outputs (e.g., labels). In this work, we demonstrate, for the first time, the ability to enhance such decision-based attacks. To accomplish this, we present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack. The leakage of inference-state elements into algorithmic timing side channels has never been studied before, and we have found that it can contain rich information that facilitates superior timing attacks that significantly outperform attacks based solely on label outputs. In a case study, we investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors. In our examination of the timing side-channel vulnerabilities associated with this algorithm, we identified the potential to enhance decision-based attacks. We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference. Our experiments show that our adversarial examples exhibit superior perturbation quality compared to a decision-based attack. In addition, we present a new threat model in which dataset inference based solely on timing leakage is performed. To address the timing leakage vulnerability inherent in the NMS algorithm, we explore the potential and limitations of implementing constant-time inference passes as a mitigation strategy.

Viaarxiv icon

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Jul 24, 2023
Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov

Figure 1 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
Figure 2 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
Figure 3 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
Figure 4 for (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

Viaarxiv icon

Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models

Nov 24, 2022
Jacob Shams, Ben Nassi, Ikuya Morikawa, Toshiya Shimizu, Asaf Shabtai, Yuval Elovici

Figure 1 for Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models
Figure 2 for Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models
Figure 3 for Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models
Figure 4 for Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models

In recent years, various watermarking methods were suggested to detect computer vision models obtained illegitimately from their owners, however they fail to demonstrate satisfactory robustness against model extraction attacks. In this paper, we present an adaptive framework to watermark a protected model, leveraging the unique behavior present in the model due to a unique random seed initialized during the model training. This watermark is used to detect extracted models, which have the same unique behavior, indicating an unauthorized usage of the protected model's intellectual property (IP). First, we show how an initial seed for random number generation as part of model training produces distinct characteristics in the model's decision boundaries, which are inherited by extracted models and present in their decision boundaries, but aren't present in non-extracted models trained on the same data-set with a different seed. Based on our findings, we suggest the Robust Adaptive Watermarking (RAW) Framework, which utilizes the unique behavior present in the protected and extracted models to generate a watermark key-set and verification model. We show that the framework is robust to (1) unseen model extraction attacks, and (2) extracted models which undergo a blurring method (e.g., weight pruning). We evaluate the framework's robustness against a naive attacker (unaware that the model is watermarked), and an informed attacker (who employs blurring strategies to remove watermarked behavior from an extracted model), and achieve outstanding (i.e., >0.9) AUC values. Finally, we show that the framework is robust to model extraction attacks with different structure and/or architecture than the protected model.

* 9 pages, 6 figures, 3 tables 
Viaarxiv icon

EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness Syndrome

May 13, 2022
Efrat Levy, Ben Nassi, Raz Swissa, Yuval Elovici

Figure 1 for EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness Syndrome
Figure 2 for EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness Syndrome
Figure 3 for EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness Syndrome
Figure 4 for EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness Syndrome

The ability to detect whether an object is a 2D or 3D object is extremely important in autonomous driving, since a detection error can have life-threatening consequences, endangering the safety of the driver, passengers, pedestrians, and others on the road. Methods proposed to distinguish between 2 and 3D objects (e.g., liveness detection methods) are not suitable for autonomous driving, because they are object dependent or do not consider the constraints associated with autonomous driving (e.g., the need for real-time decision-making while the vehicle is moving). In this paper, we present EyeDAS, a novel few-shot learning-based method aimed at securing an object detector (OD) against the threat posed by the stereoblindness syndrome (i.e., the inability to distinguish between 2D and 3D objects). We evaluate EyeDAS's real-time performance using 2,000 objects extracted from seven YouTube video recordings of street views taken by a dash cam from the driver's seat perspective. When applying EyeDAS to seven state-of-the-art ODs as a countermeasure, EyeDAS was able to reduce the 2D misclassification rate from 71.42-100% to 2.4% with a 3D misclassification rate of 0% (TPR of 1.0). We also show that EyeDAS outperforms the baseline method and achieves an AUC of over 0.999 and a TPR of 1.0 with an FPR of 0.024.

Viaarxiv icon

Handwritten Signature Verification Using Hand-Worn Devices

Dec 19, 2016
Ben Nassi, Alona Levy, Yuval Elovici, Erez Shmueli

Figure 1 for Handwritten Signature Verification Using Hand-Worn Devices
Figure 2 for Handwritten Signature Verification Using Hand-Worn Devices
Figure 3 for Handwritten Signature Verification Using Hand-Worn Devices
Figure 4 for Handwritten Signature Verification Using Hand-Worn Devices

Online signature verification technologies, such as those available in banks and post offices, rely on dedicated digital devices such as tablets or smart pens to capture, analyze and verify signatures. In this paper, we suggest a novel method for online signature verification that relies on the increasingly available hand-worn devices, such as smartwatches or fitness trackers, instead of dedicated ad-hoc devices. Our method uses a set of known genuine and forged signatures, recorded using the motion sensors of a hand-worn device, to train a machine learning classifier. Then, given the recording of an unknown signature and a claimed identity, the classifier can determine whether the signature is genuine or forged. In order to validate our method, it was applied on 1980 recordings of genuine and forged signatures that we collected from 66 subjects in our institution. Using our method, we were able to successfully distinguish between genuine and forged signatures with a high degree of accuracy (0.98 AUC and 0.05 EER).

Viaarxiv icon