Refusal-Feature-guided Teacher for Safe Finetuning via Data Filtering and Alignment Distillation

Add code
Jun 09, 2025
Figure 1 for Refusal-Feature-guided Teacher for Safe Finetuning via Data Filtering and Alignment Distillation
Figure 2 for Refusal-Feature-guided Teacher for Safe Finetuning via Data Filtering and Alignment Distillation
Figure 3 for Refusal-Feature-guided Teacher for Safe Finetuning via Data Filtering and Alignment Distillation
Figure 4 for Refusal-Feature-guided Teacher for Safe Finetuning via Data Filtering and Alignment Distillation

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: