Picture for Yonatan Belinkov

Yonatan Belinkov

Will it Merge? On The Causes of Model Mergeability

Add code
Jan 10, 2026
Viaarxiv icon

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Add code
Jan 08, 2026
Viaarxiv icon

Structured RAG for Answering Aggregative Questions

Add code
Nov 11, 2025
Viaarxiv icon

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Add code
Oct 01, 2025
Viaarxiv icon

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Add code
Jul 09, 2025
Viaarxiv icon

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Add code
Jun 11, 2025
Viaarxiv icon

SAEs Are Good for Steering -- If You Select the Right Features

Add code
May 26, 2025
Viaarxiv icon

Language Models use Lookbacks to Track Beliefs

Add code
May 20, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Figure 1 for MIB: A Mechanistic Interpretability Benchmark
Figure 2 for MIB: A Mechanistic Interpretability Benchmark
Figure 3 for MIB: A Mechanistic Interpretability Benchmark
Figure 4 for MIB: A Mechanistic Interpretability Benchmark
Viaarxiv icon

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

Add code
Apr 01, 2025
Viaarxiv icon