Picture for Martin Wattenberg

Martin Wattenberg

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

Add code
Jun 17, 2024
Viaarxiv icon

Designing a Dashboard for Transparency and Control of Conversational AI

Add code
Jun 12, 2024
Viaarxiv icon

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Add code
Feb 22, 2024
Figure 1 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 2 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 3 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 4 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Viaarxiv icon

Measuring and Controlling Persona Drift in Language Model Dialogs

Add code
Feb 13, 2024
Viaarxiv icon

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

Add code
Jan 03, 2024
Viaarxiv icon

AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support

Add code
Oct 23, 2023
Figure 1 for AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
Figure 2 for AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
Figure 3 for AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
Figure 4 for AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
Viaarxiv icon

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Add code
Sep 17, 2023
Viaarxiv icon

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

Add code
Sep 07, 2023
Figure 1 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Figure 2 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Figure 3 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Figure 4 for Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Viaarxiv icon

Linearity of Relation Decoding in Transformer Language Models

Add code
Aug 17, 2023
Figure 1 for Linearity of Relation Decoding in Transformer Language Models
Figure 2 for Linearity of Relation Decoding in Transformer Language Models
Figure 3 for Linearity of Relation Decoding in Transformer Language Models
Figure 4 for Linearity of Relation Decoding in Transformer Language Models
Viaarxiv icon

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Add code
Jun 09, 2023
Figure 1 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 2 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 3 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 4 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Viaarxiv icon