Alert button
Picture for Andrew Bai

Andrew Bai

Alert button

Defending LLMs against Jailbreaking Attacks via Backtranslation

Feb 26, 2024
Yihan Wang, Zhouxing Shi, Andrew Bai, Cho-Jui Hsieh

Viaarxiv icon

Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?

Feb 12, 2024
Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly

Viaarxiv icon

Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Jan 21, 2024
Tong Xie, Haoyu Li, Andrew Bai, Cho-Jui Hsieh

Viaarxiv icon

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Aug 31, 2022
Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil Y. C. Lin, Cho-Jui Hsieh

Figure 1 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 2 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 3 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Figure 4 for Concept Gradient: Concept-based Interpretation Without Linear Assumption
Viaarxiv icon