Picture for Dana Arad

Dana Arad

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Add code
Jan 08, 2026
Viaarxiv icon

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Add code
Jun 11, 2025
Viaarxiv icon

SAEs Are Good for Steering -- If You Select the Right Features

Add code
May 26, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Figure 1 for MIB: A Mechanistic Interpretability Benchmark
Figure 2 for MIB: A Mechanistic Interpretability Benchmark
Figure 3 for MIB: A Mechanistic Interpretability Benchmark
Figure 4 for MIB: A Mechanistic Interpretability Benchmark
Viaarxiv icon

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Add code
Mar 09, 2024
Figure 1 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Figure 2 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Figure 3 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Figure 4 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Viaarxiv icon

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

Add code
Jun 01, 2023
Viaarxiv icon