Alert button
Picture for Alan Chan

Alan Chan

Alert button

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Bookmark button
Alert button
Apr 15, 2024
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

Viaarxiv icon

Visibility into AI Agents

Add code
Bookmark button
Alert button
Feb 04, 2024
Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

Viaarxiv icon

Black-Box Access is Insufficient for Rigorous AI Audits

Add code
Bookmark button
Alert button
Jan 25, 2024
Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

Viaarxiv icon

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Add code
Bookmark button
Alert button
Dec 22, 2023
Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger

Viaarxiv icon

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

Add code
Bookmark button
Alert button
Nov 06, 2023
Ross Gruetzemacher, Alan Chan, Kevin Frazier, Christy Manning, Štěpán Los, James Fox, José Hernández-Orallo, John Burden, Matija Franklin, Clíodhna Ní Ghuidhir, Mark Bailey, Daniel Eth, Toby Pilditch, Kyle Kilian

Viaarxiv icon

Welfare Diplomacy: Benchmarking Language Model Cooperation

Add code
Bookmark button
Alert button
Oct 13, 2023
Gabriel Mukobi, Hannah Erlebach, Niklas Lauffer, Lewis Hammond, Alan Chan, Jesse Clifton

Viaarxiv icon

Towards the Scalable Evaluation of Cooperativeness in Language Models

Add code
Bookmark button
Alert button
Mar 16, 2023
Alan Chan, Maxime Riché, Jesse Clifton

Figure 1 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 2 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 3 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Figure 4 for Towards the Scalable Evaluation of Cooperativeness in Language Models
Viaarxiv icon