Picture for Yeu-Tong Lau

Yeu-Tong Lau

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Add code
Mar 13, 2025
Viaarxiv icon

Applying sparse autoencoders to unlearn knowledge in language models

Add code
Oct 25, 2024
Figure 1 for Applying sparse autoencoders to unlearn knowledge in language models
Figure 2 for Applying sparse autoencoders to unlearn knowledge in language models
Figure 3 for Applying sparse autoencoders to unlearn knowledge in language models
Figure 4 for Applying sparse autoencoders to unlearn knowledge in language models
Viaarxiv icon

An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l

Add code
Oct 14, 2023
Figure 1 for An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
Figure 2 for An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
Figure 3 for An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
Figure 4 for An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
Viaarxiv icon