Alert button
Picture for Dawn Song

Dawn Song

Alert button

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Mar 19, 2024
Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li

Viaarxiv icon

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

Mar 15, 2024
Chenguang Wang, Ruoxi Jia, Xin Liu, Dawn Song

Viaarxiv icon

On the Societal Impact of Open Foundation Models

Feb 27, 2024
Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan

Viaarxiv icon

Evolving AI Collectives to Enhance Human Diversity and Enable Self-Regulation

Feb 19, 2024
Shiyang Lai, Yujin Potter, Junsol Kim, Richard Zhuang, Dawn Song, James Evans

Viaarxiv icon

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

Feb 12, 2024
Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li

Viaarxiv icon

GRATH: Gradual Self-Truthifying for Large Language Models

Jan 31, 2024
Weixin Chen, Dawn Song, Bo Li

Viaarxiv icon

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Nov 25, 2023
Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song

Figure 1 for TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Figure 2 for TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Figure 3 for TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Figure 4 for TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Viaarxiv icon

Managing AI Risks in an Era of Rapid Progress

Oct 26, 2023
Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

Viaarxiv icon