Picture for Hannah Brown

Hannah Brown

Self-Evaluation as a Defense Against Adversarial Attacks on LLMs

Add code
Jul 03, 2024
Viaarxiv icon

Single Character Perturbations Break LLM Alignment

Add code
Jul 03, 2024
Viaarxiv icon

Can AI Be as Creative as Humans?

Add code
Jan 12, 2024
Viaarxiv icon

Prompt Optimization via Adversarial In-Context Learning

Add code
Dec 05, 2023
Viaarxiv icon

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Add code
Oct 10, 2023
Viaarxiv icon

Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities

Add code
Jun 22, 2023
Viaarxiv icon

What Does it Mean for a Language Model to Preserve Privacy?

Add code
Feb 14, 2022
Figure 1 for What Does it Mean for a Language Model to Preserve Privacy?
Figure 2 for What Does it Mean for a Language Model to Preserve Privacy?
Figure 3 for What Does it Mean for a Language Model to Preserve Privacy?
Viaarxiv icon