Picture for Yonatan Belinkov

Yonatan Belinkov

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Add code
Jun 11, 2025
Viaarxiv icon

SAEs Are Good for Steering -- If You Select the Right Features

Add code
May 26, 2025
Viaarxiv icon

Language Models use Lookbacks to Track Beliefs

Add code
May 20, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Viaarxiv icon

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

Add code
Apr 01, 2025
Viaarxiv icon

How Generative IR Retrieves Documents Mechanistically

Add code
Mar 25, 2025
Viaarxiv icon

Inside-Out: Hidden Factual Knowledge in LLMs

Add code
Mar 19, 2025
Viaarxiv icon

Are formal and functional linguistic mechanisms dissociated?

Add code
Mar 14, 2025
Viaarxiv icon

Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps

Add code
Feb 20, 2025
Viaarxiv icon

Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs

Add code
Feb 18, 2025
Viaarxiv icon