Alert button
Picture for Aidan Ewart

Aidan Ewart

Alert button

Eight Methods to Evaluate Robust Unlearning in LLMs

Add code
Bookmark button
Alert button
Feb 26, 2024
Aengus Lynch, Phillip Guo, Aidan Ewart, Stephen Casper, Dylan Hadfield-Menell

Viaarxiv icon

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Add code
Bookmark button
Alert button
Sep 19, 2023
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey

Viaarxiv icon