Alert button

Uncovering Safety Risks in Open-source LLMs through Concept Activation Vector

Apr 18, 2024
Zhihao Xu, Ruixuan Huang, Xiting Wang, Fangzhao Wu, Jing Yao, Xing Xie

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: