Picture for Matthew Kowal

Matthew Kowal

Structuring Sparsity: Block-Sparse Featurizers Capture Visual Concept Manifolds

Add code
Jun 23, 2026
Viaarxiv icon

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

Add code
Jun 10, 2026
Viaarxiv icon

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

Add code
May 06, 2026
Viaarxiv icon

Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution

Add code
Feb 16, 2026
Viaarxiv icon

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering

Add code
Feb 06, 2026
Viaarxiv icon

Interpreting Physics in Video World Models

Add code
Feb 04, 2026
Viaarxiv icon

Large language models can effectively convince people to believe conspiracies

Add code
Jan 08, 2026
Viaarxiv icon

Emergent Persuasion: Will LLMs Persuade Without Being Prompted?

Add code
Dec 20, 2025
Viaarxiv icon

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Add code
Feb 18, 2025
Viaarxiv icon

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Add code
Feb 06, 2025
Viaarxiv icon