Picture for David Bau

David Bau

Do explanations generalize across large reasoning models?

Add code
Jan 16, 2026
Viaarxiv icon

In-Context Algebra

Add code
Dec 18, 2025
Viaarxiv icon

In-Context Learning Without Copying

Add code
Nov 07, 2025
Viaarxiv icon

LLMs Process Lists With General Filter Heads

Add code
Oct 30, 2025
Viaarxiv icon

LLMs Encode Harmfulness and Refusal Separately

Add code
Jul 16, 2025
Viaarxiv icon

Discovering Forbidden Topics in Language Models

Add code
May 26, 2025
Viaarxiv icon

When Are Concepts Erased From Diffusion Models?

Add code
May 22, 2025
Viaarxiv icon

Language Models use Lookbacks to Track Beliefs

Add code
May 20, 2025
Viaarxiv icon

Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions

Add code
May 13, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Figure 1 for MIB: A Mechanistic Interpretability Benchmark
Figure 2 for MIB: A Mechanistic Interpretability Benchmark
Figure 3 for MIB: A Mechanistic Interpretability Benchmark
Figure 4 for MIB: A Mechanistic Interpretability Benchmark
Viaarxiv icon