Picture for Bartosz Cywiński

Bartosz Cywiński

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

Add code
Mar 05, 2026
Viaarxiv icon

Simple LLM Baselines are Competitive for Model Diffing

Add code
Feb 10, 2026
Viaarxiv icon

Towards eliciting latent knowledge from LLMs with mechanistic interpretability

Add code
May 20, 2025
Viaarxiv icon

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

Add code
Jan 31, 2025
Figure 1 for SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Figure 2 for SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Figure 3 for SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Figure 4 for SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Viaarxiv icon

GUIDE: Guidance-based Incremental Learning with Diffusion Models

Add code
Mar 06, 2024
Viaarxiv icon

Adapt & Align: Continual Learning with Generative Models Latent Space Alignment

Add code
Dec 21, 2023
Viaarxiv icon