Picture for Maarten Sap

Maarten Sap

Shammie

Data Defenses Against Large Language Models

Add code
Oct 17, 2024
Figure 1 for Data Defenses Against Large Language Models
Figure 2 for Data Defenses Against Large Language Models
Figure 3 for Data Defenses Against Large Language Models
Figure 4 for Data Defenses Against Large Language Models
Viaarxiv icon

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Add code
Sep 26, 2024
Figure 1 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 2 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 3 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 4 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Viaarxiv icon

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

Add code
Sep 13, 2024
Figure 1 for AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Figure 2 for AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Figure 3 for AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Figure 4 for AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Viaarxiv icon

On the Resilience of Multi-Agent Systems with Malicious Agents

Add code
Aug 02, 2024
Viaarxiv icon

Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance

Add code
Jul 10, 2024
Viaarxiv icon

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Add code
Jun 26, 2024
Figure 1 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 2 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 3 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 4 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Viaarxiv icon

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs

Add code
May 27, 2024
Viaarxiv icon

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

Add code
May 15, 2024
Viaarxiv icon

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs

Add code
May 14, 2024
Figure 1 for Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs
Figure 2 for Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs
Figure 3 for Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs
Figure 4 for Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs
Viaarxiv icon

NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models

Add code
Apr 18, 2024
Figure 1 for NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models
Figure 2 for NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models
Figure 3 for NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models
Figure 4 for NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models
Viaarxiv icon