Alert button
Picture for Alex D'Amour

Alex D'Amour

Alert button

Transforming and Combining Rewards for Aligning Large Language Models

Add code
Bookmark button
Alert button
Feb 01, 2024
Zihao Wang, Chirag Nagpal, Jonathan Berant, Jacob Eisenstein, Alex D'Amour, Sanmi Koyejo, Victor Veitch

Viaarxiv icon

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Add code
Bookmark button
Alert button
Dec 21, 2023
Jacob Eisenstein, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alex D'Amour, DJ Dvijotham, Adam Fisch, Katherine Heller, Stephen Pfohl, Deepak Ramachandran, Peter Shaw, Jonathan Berant

Viaarxiv icon

Detecting Extrapolation with Local Ensembles

Add code
Bookmark button
Alert button
Oct 21, 2019
David Madras, James Atwood, Alex D'Amour

Figure 1 for Detecting Extrapolation with Local Ensembles
Figure 2 for Detecting Extrapolation with Local Ensembles
Figure 3 for Detecting Extrapolation with Local Ensembles
Figure 4 for Detecting Extrapolation with Local Ensembles
Viaarxiv icon

BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity

Add code
Bookmark button
Alert button
Dec 17, 2018
Alexey A. Gritsenko, Alex D'Amour, James Atwood, Yoni Halpern, D. Sculley

Figure 1 for BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity
Figure 2 for BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity
Figure 3 for BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity
Figure 4 for BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity
Viaarxiv icon