Alert button
Picture for Jakob Foerster

Jakob Foerster

Alert button

PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition

Add code
Bookmark button
Alert button
May 14, 2024
Ziyang Zhang, Qizhen Zhang, Jakob Foerster

Viaarxiv icon

Risks and Opportunities of Open-Source Generative AI

Add code
Bookmark button
Alert button
May 14, 2024
Francisco Eiras, Aleksander Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster

Viaarxiv icon

Select to Perfect: Imitating desired behavior from large multi-agent data

Add code
Bookmark button
Alert button
May 06, 2024
Tim Franzmeyer, Edith Elkind, Philip Torr, Jakob Foerster, Joao Henriques

Figure 1 for Select to Perfect: Imitating desired behavior from large multi-agent data
Figure 2 for Select to Perfect: Imitating desired behavior from large multi-agent data
Figure 3 for Select to Perfect: Imitating desired behavior from large multi-agent data
Figure 4 for Select to Perfect: Imitating desired behavior from large multi-agent data
Viaarxiv icon

Near to Mid-term Risks and Opportunities of Open Source Generative AI

Add code
Bookmark button
Alert button
Apr 25, 2024
Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Jackson, Paul Röttger, Philip H. S. Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster

Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Bookmark button
Alert button
Apr 15, 2024
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

Viaarxiv icon

Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection

Add code
Bookmark button
Alert button
Apr 10, 2024
Linas Nasvytis, Kai Sandbrink, Jakob Foerster, Tim Franzmeyer, Christian Schroeder de Witt

Viaarxiv icon

Policy-Guided Diffusion

Add code
Bookmark button
Alert button
Apr 09, 2024
Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Foerster

Viaarxiv icon

JaxUED: A simple and useable UED library in Jax

Add code
Bookmark button
Alert button
Mar 19, 2024
Samuel Coward, Michael Beukman, Jakob Foerster

Figure 1 for JaxUED: A simple and useable UED library in Jax
Figure 2 for JaxUED: A simple and useable UED library in Jax
Figure 3 for JaxUED: A simple and useable UED library in Jax
Figure 4 for JaxUED: A simple and useable UED library in Jax
Viaarxiv icon

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Add code
Bookmark button
Alert button
Feb 26, 2024
Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

Figure 1 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Figure 2 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Figure 3 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Figure 4 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Viaarxiv icon

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

Add code
Bookmark button
Alert button
Feb 26, 2024
Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, Jakob Foerster

Figure 1 for Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Figure 2 for Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Figure 3 for Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Figure 4 for Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Viaarxiv icon