Picture for Martin Wattenberg

Martin Wattenberg

Can Interpretation Predict Behavior on Unseen Data?

Add code
Jul 08, 2025
Viaarxiv icon

When Bad Data Leads to Good Models

Add code
May 07, 2025
Viaarxiv icon

The Geometry of Self-Verification in a Task-Specific Reasoning Model

Add code
Apr 19, 2025
Viaarxiv icon

Shared Global and Local Geometry of Language Model Embeddings

Add code
Mar 27, 2025
Viaarxiv icon

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Add code
Feb 18, 2025
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

ICLR: In-Context Learning of Representations

Add code
Dec 29, 2024
Figure 1 for ICLR: In-Context Learning of Representations
Figure 2 for ICLR: In-Context Learning of Representations
Figure 3 for ICLR: In-Context Learning of Representations
Figure 4 for ICLR: In-Context Learning of Representations
Viaarxiv icon

Relational Composition in Neural Networks: A Survey and Call to Action

Add code
Jul 19, 2024
Viaarxiv icon

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

Add code
Jun 17, 2024
Figure 1 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 2 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 3 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 4 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Viaarxiv icon

Designing a Dashboard for Transparency and Control of Conversational AI

Add code
Jun 12, 2024
Figure 1 for Designing a Dashboard for Transparency and Control of Conversational AI
Figure 2 for Designing a Dashboard for Transparency and Control of Conversational AI
Figure 3 for Designing a Dashboard for Transparency and Control of Conversational AI
Figure 4 for Designing a Dashboard for Transparency and Control of Conversational AI
Viaarxiv icon