Picture for Nandi Schoots

Nandi Schoots

Soft Contamination Means Benchmarks Test Shallow Generalization

Add code
Feb 12, 2026
Viaarxiv icon

Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks

Add code
Mar 03, 2025
Figure 1 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Figure 2 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Figure 3 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Figure 4 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Viaarxiv icon

Modular Training of Neural Networks aids Interpretability

Add code
Feb 04, 2025
Figure 1 for Modular Training of Neural Networks aids Interpretability
Figure 2 for Modular Training of Neural Networks aids Interpretability
Figure 3 for Modular Training of Neural Networks aids Interpretability
Figure 4 for Modular Training of Neural Networks aids Interpretability
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

The Propensity for Density in Feed-forward Models

Add code
Oct 18, 2024
Viaarxiv icon

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Add code
Oct 02, 2024
Figure 1 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 2 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 3 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 4 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Viaarxiv icon

Extending Activation Steering to Broad Skills and Multiple Behaviours

Add code
Mar 09, 2024
Figure 1 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Figure 2 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Figure 3 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Figure 4 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Viaarxiv icon

Dissecting Language Models: Machine Unlearning via Selective Pruning

Add code
Mar 02, 2024
Figure 1 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Figure 2 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Figure 3 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Figure 4 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Viaarxiv icon

Improving Activation Steering in Language Models with Mean-Centring

Add code
Dec 06, 2023
Viaarxiv icon

Comparing Optimization Targets for Contrast-Consistent Search

Add code
Nov 01, 2023
Figure 1 for Comparing Optimization Targets for Contrast-Consistent Search
Figure 2 for Comparing Optimization Targets for Contrast-Consistent Search
Figure 3 for Comparing Optimization Targets for Contrast-Consistent Search
Figure 4 for Comparing Optimization Targets for Contrast-Consistent Search
Viaarxiv icon