Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

Add code
Jun 03, 2024
Figure 1 for Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Figure 2 for Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Figure 3 for Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Figure 4 for Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: