Picture for Alexander Robey

Alexander Robey

Evaluating Language Model Reasoning about Confidential Information

Add code
Aug 27, 2025
Viaarxiv icon

Command-V: Pasting LLM Behaviors via Activation Profiles

Add code
Jun 23, 2025
Viaarxiv icon

Benchmarking Misuse Mitigation Against Covert Adversaries

Add code
Jun 06, 2025
Viaarxiv icon

Transferable Adversarial Attacks on Black-Box Vision-Language Models

Add code
May 02, 2025
Viaarxiv icon

Antidistillation Sampling

Add code
Apr 17, 2025
Viaarxiv icon

Safety Guardrails for LLM-Enabled Robots

Add code
Mar 10, 2025
Figure 1 for Safety Guardrails for LLM-Enabled Robots
Figure 2 for Safety Guardrails for LLM-Enabled Robots
Figure 3 for Safety Guardrails for LLM-Enabled Robots
Figure 4 for Safety Guardrails for LLM-Enabled Robots
Viaarxiv icon

Jailbreaking LLM-Controlled Robots

Add code
Oct 17, 2024
Figure 1 for Jailbreaking LLM-Controlled Robots
Figure 2 for Jailbreaking LLM-Controlled Robots
Figure 3 for Jailbreaking LLM-Controlled Robots
Figure 4 for Jailbreaking LLM-Controlled Robots
Viaarxiv icon

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Add code
Mar 28, 2024
Figure 1 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 2 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 3 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Figure 4 for Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon