Alert button
Picture for Markus Anderljung

Markus Anderljung

Alert button

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Bookmark button
Alert button
Apr 15, 2024
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

Viaarxiv icon

Responsible Reporting for Frontier AI Development

Add code
Bookmark button
Alert button
Apr 03, 2024
Noam Kolt, Markus Anderljung, Joslyn Barnhart, Asher Brass, Kevin Esvelt, Gillian K. Hadfield, Lennart Heim, Mikel Rodriguez, Jonas B. Sandbrink, Thomas Woodside

Viaarxiv icon

Visibility into AI Agents

Add code
Bookmark button
Alert button
Feb 04, 2024
Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

Viaarxiv icon

Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework

Add code
Bookmark button
Alert button
Nov 15, 2023
Markus Anderljung, Everett Thornton Smith, Joe O'Brien, Lisa Soder, Benjamin Bucknall, Emma Bluemke, Jonas Schuett, Robert Trager, Lacey Strahm, Rumman Chowdhury

Viaarxiv icon

Frontier AI Regulation: Managing Emerging Risks to Public Safety

Add code
Bookmark button
Alert button
Jul 11, 2023
Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf

Figure 1 for Frontier AI Regulation: Managing Emerging Risks to Public Safety
Figure 2 for Frontier AI Regulation: Managing Emerging Risks to Public Safety
Figure 3 for Frontier AI Regulation: Managing Emerging Risks to Public Safety
Figure 4 for Frontier AI Regulation: Managing Emerging Risks to Public Safety
Viaarxiv icon

Model evaluation for extreme risks

Add code
Bookmark button
Alert button
May 24, 2023
Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

Figure 1 for Model evaluation for extreme risks
Figure 2 for Model evaluation for extreme risks
Figure 3 for Model evaluation for extreme risks
Figure 4 for Model evaluation for extreme risks
Viaarxiv icon

Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?

Add code
Bookmark button
Alert button
Mar 29, 2023
Markus Anderljung, Julian Hazell

Figure 1 for Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?
Viaarxiv icon

The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market

Add code
Bookmark button
Alert button
Aug 23, 2022
Charlotte Siegmann, Markus Anderljung

Figure 1 for The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market
Figure 2 for The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market
Figure 3 for The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market
Figure 4 for The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market
Viaarxiv icon