Picture for Joshua Engels

Joshua Engels

Building Production-Ready Probes For Gemini

Add code
Jan 16, 2026
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

Dense SAE Latents Are Features, Not Bugs

Add code
Jun 18, 2025
Figure 1 for Dense SAE Latents Are Features, Not Bugs
Figure 2 for Dense SAE Latents Are Features, Not Bugs
Figure 3 for Dense SAE Latents Are Features, Not Bugs
Figure 4 for Dense SAE Latents Are Features, Not Bugs
Viaarxiv icon

Scaling Laws For Scalable Oversight

Add code
Apr 25, 2025
Viaarxiv icon

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Add code
Feb 23, 2025
Figure 1 for Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Figure 2 for Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Figure 3 for Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Figure 4 for Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Viaarxiv icon

Low-Rank Adapting Models for Sparse Autoencoders

Add code
Jan 31, 2025
Figure 1 for Low-Rank Adapting Models for Sparse Autoencoders
Figure 2 for Low-Rank Adapting Models for Sparse Autoencoders
Figure 3 for Low-Rank Adapting Models for Sparse Autoencoders
Figure 4 for Low-Rank Adapting Models for Sparse Autoencoders
Viaarxiv icon

Decomposing The Dark Matter of Sparse Autoencoders

Add code
Oct 18, 2024
Figure 1 for Decomposing The Dark Matter of Sparse Autoencoders
Figure 2 for Decomposing The Dark Matter of Sparse Autoencoders
Figure 3 for Decomposing The Dark Matter of Sparse Autoencoders
Figure 4 for Decomposing The Dark Matter of Sparse Autoencoders
Viaarxiv icon

Efficient Dictionary Learning with Switch Sparse Autoencoders

Add code
Oct 10, 2024
Figure 1 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 2 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 3 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 4 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Viaarxiv icon

Not All Language Model Features Are Linear

Add code
May 23, 2024
Figure 1 for Not All Language Model Features Are Linear
Figure 2 for Not All Language Model Features Are Linear
Figure 3 for Not All Language Model Features Are Linear
Figure 4 for Not All Language Model Features Are Linear
Viaarxiv icon

Approximate Nearest Neighbor Search with Window Filters

Add code
Feb 01, 2024
Figure 1 for Approximate Nearest Neighbor Search with Window Filters
Figure 2 for Approximate Nearest Neighbor Search with Window Filters
Figure 3 for Approximate Nearest Neighbor Search with Window Filters
Figure 4 for Approximate Nearest Neighbor Search with Window Filters
Viaarxiv icon