Alert button
Picture for David Lindner

David Lindner

Alert button

Evaluating Frontier Models for Dangerous Capabilities

Mar 20, 2024
Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane

Viaarxiv icon

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Oct 19, 2023
Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner

Viaarxiv icon

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

Aug 08, 2023
Yannick Metz, David Lindner, Raphaël Baur, Daniel Keim, Mennatallah El-Assady

Figure 1 for RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
Figure 2 for RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Jul 27, 2023
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Learning Safety Constraints from Demonstrations with Unknown Rewards

May 25, 2023
David Lindner, Xin Chen, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

Figure 1 for Learning Safety Constraints from Demonstrations with Unknown Rewards
Figure 2 for Learning Safety Constraints from Demonstrations with Unknown Rewards
Figure 3 for Learning Safety Constraints from Demonstrations with Unknown Rewards
Figure 4 for Learning Safety Constraints from Demonstrations with Unknown Rewards
Viaarxiv icon

Tracr: Compiled Transformers as a Laboratory for Interpretability

Jan 12, 2023
David Lindner, János Kramár, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik

Figure 1 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Figure 2 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Figure 3 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Figure 4 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Viaarxiv icon

Red-Teaming the Stable Diffusion Safety Filter

Oct 11, 2022
Javier Rando, Daniel Paleka, David Lindner, Lennard Heim, Florian Tramèr

Figure 1 for Red-Teaming the Stable Diffusion Safety Filter
Figure 2 for Red-Teaming the Stable Diffusion Safety Filter
Figure 3 for Red-Teaming the Stable Diffusion Safety Filter
Figure 4 for Red-Teaming the Stable Diffusion Safety Filter
Viaarxiv icon

Active Exploration for Inverse Reinforcement Learning

Jul 18, 2022
David Lindner, Andreas Krause, Giorgia Ramponi

Figure 1 for Active Exploration for Inverse Reinforcement Learning
Figure 2 for Active Exploration for Inverse Reinforcement Learning
Figure 3 for Active Exploration for Inverse Reinforcement Learning
Figure 4 for Active Exploration for Inverse Reinforcement Learning
Viaarxiv icon

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning

Jun 27, 2022
David Lindner, Mennatallah El-Assady

Figure 1 for Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning
Figure 2 for Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning
Viaarxiv icon

Interactively Learning Preference Constraints in Linear Bandits

Jun 10, 2022
David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

Figure 1 for Interactively Learning Preference Constraints in Linear Bandits
Figure 2 for Interactively Learning Preference Constraints in Linear Bandits
Figure 3 for Interactively Learning Preference Constraints in Linear Bandits
Figure 4 for Interactively Learning Preference Constraints in Linear Bandits
Viaarxiv icon