Picture for Wenyuan Jiang

Wenyuan Jiang

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

Add code
Apr 04, 2026
Viaarxiv icon

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Add code
Mar 01, 2026
Viaarxiv icon