Picture for Alex Tamkin

Alex Tamkin

Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions

Add code
Apr 21, 2025
Viaarxiv icon

Clio: Privacy-Preserving Insights into Real-World AI Use

Add code
Dec 18, 2024
Viaarxiv icon

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

Add code
Jun 17, 2024
Figure 1 for Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
Figure 2 for Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
Figure 3 for Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
Figure 4 for Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
Viaarxiv icon

Collective Constitutional AI: Aligning a Language Model with Public Input

Add code
Jun 12, 2024
Viaarxiv icon

Bayesian Preference Elicitation with Language Models

Add code
Mar 08, 2024
Figure 1 for Bayesian Preference Elicitation with Language Models
Figure 2 for Bayesian Preference Elicitation with Language Models
Figure 3 for Bayesian Preference Elicitation with Language Models
Figure 4 for Bayesian Preference Elicitation with Language Models
Viaarxiv icon

Evaluating and Mitigating Discrimination in Language Model Decisions

Add code
Dec 06, 2023
Viaarxiv icon

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Add code
Oct 26, 2023
Viaarxiv icon

Codebook Features: Sparse and Discrete Interpretability for Neural Networks

Add code
Oct 26, 2023
Viaarxiv icon

Eliciting Human Preferences with Language Models

Add code
Oct 17, 2023
Viaarxiv icon

Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

Add code
Sep 26, 2023
Viaarxiv icon