Picture for Alex Tamkin

Alex Tamkin

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Collective Constitutional AI: Aligning a Language Model with Public Input

Add code
Jun 12, 2024
Viaarxiv icon

Bayesian Preference Elicitation with Language Models

Add code
Mar 08, 2024
Figure 1 for Bayesian Preference Elicitation with Language Models
Figure 2 for Bayesian Preference Elicitation with Language Models
Figure 3 for Bayesian Preference Elicitation with Language Models
Figure 4 for Bayesian Preference Elicitation with Language Models
Viaarxiv icon

Evaluating and Mitigating Discrimination in Language Model Decisions

Add code
Dec 06, 2023
Figure 1 for Evaluating and Mitigating Discrimination in Language Model Decisions
Figure 2 for Evaluating and Mitigating Discrimination in Language Model Decisions
Figure 3 for Evaluating and Mitigating Discrimination in Language Model Decisions
Figure 4 for Evaluating and Mitigating Discrimination in Language Model Decisions
Viaarxiv icon

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

Add code
Oct 26, 2023
Figure 1 for Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Figure 2 for Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Figure 3 for Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Figure 4 for Social Contract AI: Aligning AI Assistants with Implicit Group Norms
Viaarxiv icon

Codebook Features: Sparse and Discrete Interpretability for Neural Networks

Add code
Oct 26, 2023
Figure 1 for Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Figure 2 for Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Figure 3 for Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Figure 4 for Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Viaarxiv icon

Eliciting Human Preferences with Language Models

Add code
Oct 17, 2023
Figure 1 for Eliciting Human Preferences with Language Models
Figure 2 for Eliciting Human Preferences with Language Models
Figure 3 for Eliciting Human Preferences with Language Models
Figure 4 for Eliciting Human Preferences with Language Models
Viaarxiv icon

Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

Add code
Sep 26, 2023
Figure 1 for Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
Figure 2 for Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
Figure 3 for Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
Figure 4 for Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Add code
Aug 07, 2023
Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Add code
Jun 28, 2023
Figure 1 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 2 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 3 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Figure 4 for Towards Measuring the Representation of Subjective Global Opinions in Language Models
Viaarxiv icon