Alert button
Picture for Scott A. Hale

Scott A. Hale

Alert button

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Add code
Bookmark button
Alert button
Apr 18, 2024
Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse Khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

Viaarxiv icon

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

Add code
Bookmark button
Alert button
Nov 14, 2023
Bertie Vidgen, Hannah Rose Kirk, Rebecca Qian, Nino Scherrer, Anand Kannappan, Scott A. Hale, Paul Röttger

Viaarxiv icon

Lost in Translation -- Multilingual Misinformation and its Evolution

Add code
Bookmark button
Alert button
Oct 27, 2023
Dorian Quelle, Calvin Cheng, Alexandre Bovet, Scott A. Hale

Viaarxiv icon

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Add code
Bookmark button
Alert button
Oct 11, 2023
Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale

Viaarxiv icon

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

Add code
Bookmark button
Alert button
Oct 03, 2023
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Viaarxiv icon

Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West

Add code
Bookmark button
Alert button
Sep 15, 2023
Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

Figure 1 for Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West
Figure 2 for Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West
Figure 3 for Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West
Figure 4 for Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West
Viaarxiv icon

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

Add code
Bookmark button
Alert button
Aug 21, 2023
Hannah Rose Kirk, Angus R. Williams, Liam Burke, Yi-Ling Chung, Ivan Debono, Pica Johansson, Francesca Stevens, Jonathan Bright, Scott A. Hale

Figure 1 for DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Figure 2 for DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Figure 3 for DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Figure 4 for DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Viaarxiv icon

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

Add code
Bookmark button
Alert button
Mar 09, 2023
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Figure 1 for Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Viaarxiv icon

Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

Add code
Bookmark button
Alert button
Oct 14, 2022
Ashkan Kazemi, Artem Abzaliev, Naihao Deng, Rui Hou, Davis Liang, Scott A. Hale, Verónica Pérez-Rosas, Rada Mihalcea

Figure 1 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery
Figure 2 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery
Figure 3 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery
Figure 4 for Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery
Viaarxiv icon