Picture for Martin Wattenberg

Martin Wattenberg

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

Add code
Jun 17, 2024
Figure 1 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 2 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 3 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Figure 4 for Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Viaarxiv icon

Designing a Dashboard for Transparency and Control of Conversational AI

Add code
Jun 12, 2024
Figure 1 for Designing a Dashboard for Transparency and Control of Conversational AI
Figure 2 for Designing a Dashboard for Transparency and Control of Conversational AI
Figure 3 for Designing a Dashboard for Transparency and Control of Conversational AI
Figure 4 for Designing a Dashboard for Transparency and Control of Conversational AI
Viaarxiv icon

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Add code
Feb 22, 2024
Figure 1 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 2 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 3 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Figure 4 for Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Viaarxiv icon

Measuring and Controlling Persona Drift in Language Model Dialogs

Add code
Feb 13, 2024
Figure 1 for Measuring and Controlling Persona Drift in Language Model Dialogs
Figure 2 for Measuring and Controlling Persona Drift in Language Model Dialogs
Figure 3 for Measuring and Controlling Persona Drift in Language Model Dialogs
Figure 4 for Measuring and Controlling Persona Drift in Language Model Dialogs
Viaarxiv icon

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

Add code
Jan 03, 2024
Figure 1 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Figure 2 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Figure 3 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Figure 4 for A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Viaarxiv icon

AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support

Add code
Oct 23, 2023
Figure 1 for AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
Figure 2 for AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
Figure 3 for AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support
Viaarxiv icon

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Add code
Sep 17, 2023
Figure 1 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Figure 2 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Figure 3 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Figure 4 for ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Viaarxiv icon

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

Add code
Sep 07, 2023
Viaarxiv icon

Linearity of Relation Decoding in Transformer Language Models

Add code
Aug 17, 2023
Viaarxiv icon

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Add code
Jun 09, 2023
Figure 1 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 2 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 3 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 4 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Viaarxiv icon