Picture for Alan Chan

Alan Chan

IDs for AI Systems

Add code
Jun 17, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

Visibility into AI Agents

Add code
Feb 04, 2024
Viaarxiv icon

Black-Box Access is Insufficient for Rigorous AI Audits

Add code
Jan 25, 2024
Figure 1 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 2 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 3 for Black-Box Access is Insufficient for Rigorous AI Audits
Viaarxiv icon

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Add code
Dec 22, 2023
Viaarxiv icon

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

Add code
Nov 06, 2023
Viaarxiv icon

Welfare Diplomacy: Benchmarking Language Model Cooperation

Add code
Oct 13, 2023
Figure 1 for Welfare Diplomacy: Benchmarking Language Model Cooperation
Figure 2 for Welfare Diplomacy: Benchmarking Language Model Cooperation
Figure 3 for Welfare Diplomacy: Benchmarking Language Model Cooperation
Figure 4 for Welfare Diplomacy: Benchmarking Language Model Cooperation
Viaarxiv icon

Towards the Scalable Evaluation of Cooperativeness in Language Models

Add code
Mar 16, 2023
Figure 1 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 2 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 3 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 4 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Viaarxiv icon

Scoring Rules for Performative Binary Prediction

Add code
Jul 05, 2022
Figure 1 for Scoring Rules for Performative Binary Prediction
Figure 2 for Scoring Rules for Performative Binary Prediction
Figure 3 for Scoring Rules for Performative Binary Prediction
Figure 4 for Scoring Rules for Performative Binary Prediction
Viaarxiv icon

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Add code
Jul 17, 2021
Figure 1 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Figure 2 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Figure 3 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Figure 4 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Viaarxiv icon