Claude Code


EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

Add code
Apr 02, 2026
Viaarxiv icon

PHMForge: A Scenario-Driven Agentic Benchmark for Industrial Asset Lifecycle Maintenance

Add code
Apr 02, 2026
Viaarxiv icon

Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time

Add code
Apr 01, 2026
Viaarxiv icon

VibeGuard: A Security Gate Framework for AI-Generated Code

Add code
Apr 01, 2026
Viaarxiv icon

AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

Add code
Mar 31, 2026
Viaarxiv icon

UK AISI Alignment Evaluation Case-Study

Add code
Apr 01, 2026
Viaarxiv icon

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Add code
Mar 30, 2026
Viaarxiv icon

AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding

Add code
Mar 31, 2026
Viaarxiv icon

Needle in the Repo: A Benchmark for Maintainability in AI-Generated Repository Edits

Add code
Mar 29, 2026
Viaarxiv icon

APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

Add code
Mar 31, 2026
Viaarxiv icon