Picture for Gonçalo Paulo

Gonçalo Paulo

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Add code
May 17, 2025
Viaarxiv icon

Transcoders Beat Sparse Autoencoders for Interpretability

Add code
Jan 31, 2025
Viaarxiv icon

Partially Rewriting a Transformer in Natural Language

Add code
Jan 31, 2025
Viaarxiv icon

Sparse Autoencoders Trained on the Same Data Learn Different Features

Add code
Jan 29, 2025
Viaarxiv icon

Automatically Interpreting Millions of Features in Large Language Models

Add code
Oct 17, 2024
Viaarxiv icon

Does Transformer Interpretability Transfer to RNNs?

Add code
Apr 09, 2024
Viaarxiv icon