Picture for Nora Petrova

Nora Petrova

Latent Adversarial Training Improves the Representation of Refusal

Add code
Apr 26, 2025
Viaarxiv icon

Characterizing stable regions in the residual stream of LLMs

Add code
Sep 26, 2024
Viaarxiv icon