Alert button
Picture for Jan Wehner

Jan Wehner

Alert button

Immunization against harmful fine-tuning attacks

Add code
Bookmark button
Alert button
Feb 26, 2024
Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, Jan Batzner, Hassan Sajjad, Frank Rudzicz

Viaarxiv icon

Explaining Learned Reward Functions with Counterfactual Trajectories

Add code
Bookmark button
Alert button
Feb 07, 2024
Jan Wehner, Frans Oliehoek, Luciano Cavalcante Siebert

Viaarxiv icon