Alert button
Picture for Martin Wattenberg

Martin Wattenberg

Alert button

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Add code
Bookmark button
Alert button
Feb 22, 2024
Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

Viaarxiv icon

Measuring and Controlling Persona Drift in Language Model Dialogs

Add code
Bookmark button
Alert button
Feb 13, 2024
Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Viaarxiv icon

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

Add code
Bookmark button
Alert button
Jan 03, 2024
Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea

Viaarxiv icon

AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support

Add code
Bookmark button
Alert button
Oct 23, 2023
Michael Terry, Chinmay Kulkarni, Martin Wattenberg, Lucas Dixon, Meredith Ringel Morris

Viaarxiv icon

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Add code
Bookmark button
Alert button
Sep 17, 2023
Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, Elena Glassman

Viaarxiv icon

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

Add code
Bookmark button
Alert button
Sep 07, 2023
Neel Nanda, Andrew Lee, Martin Wattenberg

Figure 1 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Figure 2 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Figure 3 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Figure 4 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Viaarxiv icon

Linearity of Relation Decoding in Transformer Language Models

Add code
Bookmark button
Alert button
Aug 17, 2023
Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau

Figure 1 for Linearity of Relation Decoding in Transformer Language Models
Figure 2 for Linearity of Relation Decoding in Transformer Language Models
Figure 3 for Linearity of Relation Decoding in Transformer Language Models
Figure 4 for Linearity of Relation Decoding in Transformer Language Models
Viaarxiv icon

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Add code
Bookmark button
Alert button
Jun 09, 2023
Yida Chen, Fernanda Viégas, Martin Wattenberg

Figure 1 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 2 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 3 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 4 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Viaarxiv icon

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Add code
Bookmark button
Alert button
Jun 07, 2023
Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Figure 1 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 2 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 3 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 4 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Viaarxiv icon