Picture for J Rosser

J Rosser

Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients

Add code
Mar 17, 2026
Viaarxiv icon

Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions

Add code
Feb 11, 2026
Viaarxiv icon

Mapping Faithful Reasoning in Language Models

Add code
Oct 25, 2025
Viaarxiv icon

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds

Add code
Feb 02, 2025
Viaarxiv icon