Picture for Sven Harms

Sven Harms

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Add code
Jun 08, 2026
Viaarxiv icon