Alert button
Picture for Nina Rimsky

Nina Rimsky

Alert button

Investigating Bias Representations in Llama 2 Chat via Activation Steering

Add code
Bookmark button
Alert button
Feb 01, 2024
Dawn Lu, Nina Rimsky

Viaarxiv icon

Steering Llama 2 via Contrastive Activation Addition

Add code
Bookmark button
Alert button
Dec 09, 2023
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner

Figure 1 for Steering Llama 2 via Contrastive Activation Addition
Figure 2 for Steering Llama 2 via Contrastive Activation Addition
Figure 3 for Steering Llama 2 via Contrastive Activation Addition
Figure 4 for Steering Llama 2 via Contrastive Activation Addition
Viaarxiv icon