Picture for Xia Hu

Xia Hu

Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX

Add code
Apr 16, 2026
Viaarxiv icon

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

Add code
Apr 08, 2026
Viaarxiv icon

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Add code
Apr 08, 2026
Viaarxiv icon

ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety

Add code
Apr 02, 2026
Viaarxiv icon

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

Add code
Mar 16, 2026
Viaarxiv icon

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Add code
Mar 12, 2026
Viaarxiv icon

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

Add code
Mar 02, 2026
Viaarxiv icon

A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study

Add code
Feb 22, 2026
Viaarxiv icon

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Add code
Feb 16, 2026
Viaarxiv icon

DeepSight: An All-in-One LM Safety Toolkit

Add code
Feb 12, 2026
Viaarxiv icon