Alert button
Picture for David Bau

David Bau

Alert button

Locating and Editing Factual Associations in Mamba

Add code
Bookmark button
Alert button
Apr 04, 2024
Arnab Sen Sharma, David Atkinson, David Bau

Viaarxiv icon

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Bookmark button
Alert button
Mar 31, 2024
Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Figure 1 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 2 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 3 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 4 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Viaarxiv icon

Model Lakes

Add code
Bookmark button
Alert button
Mar 04, 2024
Koyena Pal, David Bau, Renée J. Miller

Figure 1 for Model Lakes
Viaarxiv icon

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Add code
Bookmark button
Alert button
Feb 22, 2024
Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

Viaarxiv icon

Measuring and Controlling Persona Drift in Language Model Dialogs

Add code
Bookmark button
Alert button
Feb 13, 2024
Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Viaarxiv icon

Black-Box Access is Insufficient for Rigorous AI Audits

Add code
Bookmark button
Alert button
Jan 25, 2024
Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

Viaarxiv icon

Testing Language Model Agents Safely in the Wild

Add code
Bookmark button
Alert button
Dec 03, 2023
Silen Naihin, David Atkinson, Marc Green, Merwane Hamadi, Craig Swift, Douglas Schonholtz, Adam Tauman Kalai, David Bau

Figure 1 for Testing Language Model Agents Safely in the Wild
Figure 2 for Testing Language Model Agents Safely in the Wild
Figure 3 for Testing Language Model Agents Safely in the Wild
Figure 4 for Testing Language Model Agents Safely in the Wild
Viaarxiv icon

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Add code
Bookmark button
Alert button
Nov 27, 2023
Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau

Figure 1 for Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Figure 2 for Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Figure 3 for Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Figure 4 for Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Viaarxiv icon