Picture for Stepan Shabalin

Stepan Shabalin

Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning

Add code
May 30, 2025
Viaarxiv icon

Patterns and Mechanisms of Contrastive Activation Engineering

Add code
May 06, 2025
Viaarxiv icon

Scaling sparse feature circuit finding for in-context learning

Add code
Apr 18, 2025
Viaarxiv icon

Transcoders Beat Sparse Autoencoders for Interpretability

Add code
Jan 31, 2025
Viaarxiv icon

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Add code
Apr 22, 2024
Viaarxiv icon

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

Add code
May 29, 2023
Viaarxiv icon