Picture for Nandi Schoots

Nandi Schoots

Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks

Add code
Mar 03, 2025
Figure 1 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Figure 2 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Figure 3 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Figure 4 for Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
Viaarxiv icon

Modular Training of Neural Networks aids Interpretability

Add code
Feb 04, 2025
Figure 1 for Modular Training of Neural Networks aids Interpretability
Figure 2 for Modular Training of Neural Networks aids Interpretability
Figure 3 for Modular Training of Neural Networks aids Interpretability
Figure 4 for Modular Training of Neural Networks aids Interpretability
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

The Propensity for Density in Feed-forward Models

Add code
Oct 18, 2024
Viaarxiv icon

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Add code
Oct 02, 2024
Figure 1 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 2 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 3 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 4 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Viaarxiv icon

Extending Activation Steering to Broad Skills and Multiple Behaviours

Add code
Mar 09, 2024
Figure 1 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Figure 2 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Figure 3 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Figure 4 for Extending Activation Steering to Broad Skills and Multiple Behaviours
Viaarxiv icon

Dissecting Language Models: Machine Unlearning via Selective Pruning

Add code
Mar 02, 2024
Figure 1 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Figure 2 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Figure 3 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Figure 4 for Dissecting Language Models: Machine Unlearning via Selective Pruning
Viaarxiv icon

Improving Activation Steering in Language Models with Mean-Centring

Add code
Dec 06, 2023
Viaarxiv icon

Comparing Optimization Targets for Contrast-Consistent Search

Add code
Nov 01, 2023
Figure 1 for Comparing Optimization Targets for Contrast-Consistent Search
Figure 2 for Comparing Optimization Targets for Contrast-Consistent Search
Figure 3 for Comparing Optimization Targets for Contrast-Consistent Search
Figure 4 for Comparing Optimization Targets for Contrast-Consistent Search
Viaarxiv icon

Any Deep ReLU Network is Shallow

Add code
Jun 20, 2023
Figure 1 for Any Deep ReLU Network is Shallow
Figure 2 for Any Deep ReLU Network is Shallow
Figure 3 for Any Deep ReLU Network is Shallow
Figure 4 for Any Deep ReLU Network is Shallow
Viaarxiv icon