Alert button
Picture for Avital Balwit

Avital Balwit

Alert button

Specific versus General Principles for Constitutional AI

Add code
Bookmark button
Alert button
Oct 20, 2023
Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson, Shannon Yang, Shauna Kravec, Timothy Telleen-Lawton, Thomas I. Liao, Tom Henighan, Tristan Hume, Zac Hatfield-Dodds, Sören Mindermann, Nicholas Joseph, Sam McCandlish, Jared Kaplan

Figure 1 for Specific versus General Principles for Constitutional AI
Figure 2 for Specific versus General Principles for Constitutional AI
Figure 3 for Specific versus General Principles for Constitutional AI
Figure 4 for Specific versus General Principles for Constitutional AI
Viaarxiv icon

Aligned with Whom? Direct and social goals for AI systems

Add code
Bookmark button
Alert button
May 09, 2022
Anton Korinek, Avital Balwit

Figure 1 for Aligned with Whom? Direct and social goals for AI systems
Figure 2 for Aligned with Whom? Direct and social goals for AI systems
Viaarxiv icon

Truthful AI: Developing and governing AI that does not lie

Add code
Bookmark button
Alert button
Oct 13, 2021
Owain Evans, Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, Avital Balwit, Peter Wills, Luca Righetti, William Saunders

Figure 1 for Truthful AI: Developing and governing AI that does not lie
Figure 2 for Truthful AI: Developing and governing AI that does not lie
Figure 3 for Truthful AI: Developing and governing AI that does not lie
Figure 4 for Truthful AI: Developing and governing AI that does not lie
Viaarxiv icon