Alert button
Picture for Iason Gabriel

Iason Gabriel

Alert button

Sociotechnical Safety Evaluation of Generative AI Systems

Oct 31, 2023
Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, William Isaac

Viaarxiv icon

Model evaluation for extreme risks

May 24, 2023
Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

Figure 1 for Model evaluation for extreme risks
Figure 2 for Model evaluation for extreme risks
Figure 3 for Model evaluation for extreme risks
Figure 4 for Model evaluation for extreme risks
Viaarxiv icon

Manifestations of Xenophobia in AI Systems

Dec 15, 2022
Nenad Tomasev, Jonathan Leader Maynard, Iason Gabriel

Viaarxiv icon

A Human Rights-Based Approach to Responsible AI

Oct 06, 2022
Vinodkumar Prabhakaran, Margaret Mitchell, Timnit Gebru, Iason Gabriel

Figure 1 for A Human Rights-Based Approach to Responsible AI
Viaarxiv icon

Improving alignment of dialogue agents via targeted human judgements

Sep 28, 2022
Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soňa Mokrá, Nicholas Fernando, Boxi Wu, Rachel Foley, Susannah Young, Iason Gabriel, William Isaac, John Mellor, Demis Hassabis, Koray Kavukcuoglu, Lisa Anne Hendricks, Geoffrey Irving

Figure 1 for Improving alignment of dialogue agents via targeted human judgements
Figure 2 for Improving alignment of dialogue agents via targeted human judgements
Figure 3 for Improving alignment of dialogue agents via targeted human judgements
Figure 4 for Improving alignment of dialogue agents via targeted human judgements
Viaarxiv icon

In conversation with Artificial Intelligence: aligning language models with human values

Sep 01, 2022
Atoosa Kasirzadeh, Iason Gabriel

Viaarxiv icon

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

Jun 16, 2022
Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Weidinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, William Isaac, Lisa Anne Hendricks

Figure 1 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Figure 2 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Figure 3 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Viaarxiv icon

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Dec 08, 2021
Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving

Figure 1 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 2 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 3 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 4 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Viaarxiv icon

Ethical and social risks of harm from Language Models

Dec 08, 2021
Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

Figure 1 for Ethical and social risks of harm from Language Models
Figure 2 for Ethical and social risks of harm from Language Models
Viaarxiv icon