Picture for Andrew Bai

Andrew Bai

Defending LLMs against Jailbreaking Attacks via Backtranslation

Add code
Feb 28, 2024
Viaarxiv icon

Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?

Add code
Feb 12, 2024
Viaarxiv icon

Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Add code
Jan 21, 2024
Viaarxiv icon

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Add code
Aug 31, 2022
Figure 1 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 2 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 3 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 4 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Viaarxiv icon