Picture for Patrick Wilhelm

Patrick Wilhelm

From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents

Add code
Jun 04, 2026
Viaarxiv icon

Revisiting Gradient Staleness: Evaluating Distance Metrics for Asynchronous Federated Learning Aggregation

Add code
Mar 09, 2026
Viaarxiv icon

Noise-aware Client Selection for carbon-efficient Federated Learning via Gradient Norm Thresholding

Add code
Mar 04, 2026
Viaarxiv icon

Monitoring Emergent Reward Hacking During Generation via Internal Activations

Add code
Mar 04, 2026
Viaarxiv icon