Picture for Caden Juang

Caden Juang

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Add code
Jul 22, 2025
Viaarxiv icon

Automatically Interpreting Millions of Features in Large Language Models

Add code
Oct 17, 2024
Viaarxiv icon

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Add code
Jul 18, 2024
Figure 1 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 2 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 3 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 4 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Viaarxiv icon

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Add code
May 11, 2024
Figure 1 for Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Figure 2 for Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Figure 3 for Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Figure 4 for Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Viaarxiv icon