Picture for William Yeh

William Yeh

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders

Add code
May 20, 2025
Viaarxiv icon