Picture for Nahema Marchal

Nahema Marchal

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

Add code
Jun 19, 2024
Viaarxiv icon

STAR: SocioTechnical Approach to Red Teaming Language Models

Add code
Jun 17, 2024
Figure 1 for STAR: SocioTechnical Approach to Red Teaming Language Models
Figure 2 for STAR: SocioTechnical Approach to Red Teaming Language Models
Figure 3 for STAR: SocioTechnical Approach to Red Teaming Language Models
Figure 4 for STAR: SocioTechnical Approach to Red Teaming Language Models
Viaarxiv icon

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

Add code
Apr 23, 2024
Figure 1 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 2 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 3 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 4 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Viaarxiv icon

Sociotechnical Safety Evaluation of Generative AI Systems

Add code
Oct 31, 2023
Viaarxiv icon

Model evaluation for extreme risks

Add code
May 24, 2023
Figure 1 for Model evaluation for extreme risks
Figure 2 for Model evaluation for extreme risks
Figure 3 for Model evaluation for extreme risks
Figure 4 for Model evaluation for extreme risks
Viaarxiv icon