Picture for David Krueger

David Krueger

Learning to Forget using Hypernetworks

Add code
Dec 01, 2024
Viaarxiv icon

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

Add code
Nov 11, 2024
Viaarxiv icon

Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

Add code
Nov 07, 2024
Viaarxiv icon

Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

Add code
Nov 07, 2024
Viaarxiv icon

Predicting Future Actions of Reinforcement Learning Agents

Add code
Oct 29, 2024
Viaarxiv icon

Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Add code
Oct 27, 2024
Figure 1 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Figure 2 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Figure 3 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Figure 4 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Viaarxiv icon

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Add code
Oct 22, 2024
Viaarxiv icon

Influence Functions for Scalable Data Attribution in Diffusion Models

Add code
Oct 17, 2024
Viaarxiv icon

Analyzing (In)Abilities of SAEs via Formal Languages

Add code
Oct 15, 2024
Viaarxiv icon

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

Add code
Oct 11, 2024
Figure 1 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 2 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 3 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 4 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Viaarxiv icon