Alert button
Picture for Andrew Bai

Andrew Bai

Alert button

Defending LLMs against Jailbreaking Attacks via Backtranslation

Add code
Bookmark button
Alert button
Feb 28, 2024
Yihan Wang, Zhouxing Shi, Andrew Bai, Cho-Jui Hsieh

Viaarxiv icon

Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?

Add code
Bookmark button
Alert button
Feb 12, 2024
Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly

Viaarxiv icon

Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Add code
Bookmark button
Alert button
Jan 21, 2024
Tong Xie, Haoyu Li, Andrew Bai, Cho-Jui Hsieh

Viaarxiv icon

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Add code
Bookmark button
Alert button
Aug 31, 2022
Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil Y. C. Lin, Cho-Jui Hsieh

Figure 1 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 2 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 3 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 4 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Viaarxiv icon