Picture for Alexander Robey

Alexander Robey

Command-V: Pasting LLM Behaviors via Activation Profiles

Add code
Jun 23, 2025
Viaarxiv icon

Benchmarking Misuse Mitigation Against Covert Adversaries

Add code
Jun 06, 2025
Viaarxiv icon

Transferable Adversarial Attacks on Black-Box Vision-Language Models

Add code
May 02, 2025
Viaarxiv icon

Antidistillation Sampling

Add code
Apr 17, 2025
Viaarxiv icon

Safety Guardrails for LLM-Enabled Robots

Add code
Mar 10, 2025
Viaarxiv icon

Jailbreaking LLM-Controlled Robots

Add code
Oct 17, 2024
Figure 1 for Jailbreaking LLM-Controlled Robots
Figure 2 for Jailbreaking LLM-Controlled Robots
Figure 3 for Jailbreaking LLM-Controlled Robots
Figure 4 for Jailbreaking LLM-Controlled Robots
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Add code
Mar 28, 2024
Figure 1 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 2 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 3 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 4 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Add code
Feb 28, 2024
Figure 1 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 2 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 3 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 4 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Viaarxiv icon