Picture for Wenbo Guo

Wenbo Guo

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

Add code
Jun 03, 2026
Viaarxiv icon

Agents' Last Exam

Add code
Jun 03, 2026
Viaarxiv icon

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Add code
May 06, 2026
Viaarxiv icon

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Add code
Apr 06, 2026
Viaarxiv icon

The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey

Add code
Mar 11, 2026
Viaarxiv icon

OpenSage: Self-programming Agent Generation Engine

Add code
Feb 18, 2026
Viaarxiv icon

rePIRL: Learn PRM with Inverse RL for LLM Reasoning

Add code
Feb 08, 2026
Viaarxiv icon

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents

Add code
Feb 06, 2026
Viaarxiv icon

To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

Add code
Feb 01, 2026
Viaarxiv icon

DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle

Add code
Jan 27, 2026
Viaarxiv icon