Picture for Yisroel Mirsky

Yisroel Mirsky

PRISM: Recovering Instruction Sets from Language Model Activations

Add code
Jun 08, 2026
Viaarxiv icon

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Add code
May 14, 2026
Viaarxiv icon

LeakBoost: Perceptual-Loss-Based Membership Inference Attack

Add code
Feb 05, 2026
Viaarxiv icon

GAVEL: Towards rule-based safety through activation monitoring

Add code
Jan 29, 2026
Viaarxiv icon

Love, Lies, and Language Models: Investigating AI's Role in Romance-Baiting Scams

Add code
Dec 22, 2025
Viaarxiv icon

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias

Add code
Aug 24, 2025
Viaarxiv icon

PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting

Add code
May 08, 2025
Figure 1 for PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting
Figure 2 for PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting
Figure 3 for PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting
Figure 4 for PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting
Viaarxiv icon

Memory Backdoor Attacks on Neural Networks

Add code
Nov 21, 2024
Figure 1 for Memory Backdoor Attacks on Neural Networks
Figure 2 for Memory Backdoor Attacks on Neural Networks
Figure 3 for Memory Backdoor Attacks on Neural Networks
Figure 4 for Memory Backdoor Attacks on Neural Networks
Viaarxiv icon

PEAS: A Strategy for Crafting Transferable Adversarial Examples

Add code
Oct 20, 2024
Viaarxiv icon

Efficient Model Extraction via Boundary Sampling

Add code
Oct 20, 2024
Viaarxiv icon