Picture for Sahar Abdelnabi

Sahar Abdelnabi

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

Add code
Nov 07, 2025
Figure 1 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Figure 2 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Figure 3 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Figure 4 for ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Viaarxiv icon

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

Add code
Oct 30, 2025
Viaarxiv icon

Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies

Add code
Oct 16, 2025
Viaarxiv icon

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Add code
Jun 11, 2025
Viaarxiv icon

Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Add code
May 20, 2025
Viaarxiv icon

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

Add code
Feb 27, 2025
Viaarxiv icon

Safety is Essential for Responsible Open-Ended Systems

Add code
Feb 06, 2025
Viaarxiv icon

Hypothesizing Missing Causal Variables with LLMs

Add code
Sep 04, 2024
Viaarxiv icon

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Add code
Jun 12, 2024
Figure 1 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 2 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 3 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 4 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Viaarxiv icon

Are you still on track!? Catching LLM Task Drift with Activations

Add code
Jun 02, 2024
Figure 1 for Are you still on track!? Catching LLM Task Drift with Activations
Figure 2 for Are you still on track!? Catching LLM Task Drift with Activations
Figure 3 for Are you still on track!? Catching LLM Task Drift with Activations
Figure 4 for Are you still on track!? Catching LLM Task Drift with Activations
Viaarxiv icon