Picture for Alan Chan

Alan Chan

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

Visibility into AI Agents

Feb 04, 2024
Viaarxiv icon

Black-Box Access is Insufficient for Rigorous AI Audits

Jan 25, 2024
Figure 1 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 2 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 3 for Black-Box Access is Insufficient for Rigorous AI Audits
Viaarxiv icon

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Add code
Dec 22, 2023
Viaarxiv icon

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

Nov 06, 2023
Viaarxiv icon

Welfare Diplomacy: Benchmarking Language Model Cooperation

Add code
Oct 13, 2023
Viaarxiv icon

Towards the Scalable Evaluation of Cooperativeness in Language Models

Mar 16, 2023
Figure 1 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 2 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 3 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 4 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Viaarxiv icon

Scoring Rules for Performative Binary Prediction

Jul 05, 2022
Figure 1 for Scoring Rules for Performative Binary Prediction
Figure 2 for Scoring Rules for Performative Binary Prediction
Figure 3 for Scoring Rules for Performative Binary Prediction
Figure 4 for Scoring Rules for Performative Binary Prediction
Viaarxiv icon

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Add code
Jul 17, 2021
Figure 1 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Figure 2 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Figure 3 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Figure 4 for Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Viaarxiv icon

Parameter-free Gradient Temporal Difference Learning

May 10, 2021
Figure 1 for Parameter-free Gradient Temporal Difference Learning
Figure 2 for Parameter-free Gradient Temporal Difference Learning
Figure 3 for Parameter-free Gradient Temporal Difference Learning
Figure 4 for Parameter-free Gradient Temporal Difference Learning
Viaarxiv icon