Picture for Alexander Yun

Alexander Yun

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

Add code
Mar 25, 2026
Viaarxiv icon

Weight Updates as Activation Shifts: A Principled Framework for Steering

Add code
Feb 28, 2026
Viaarxiv icon