Picture for Yuejin Xie

Yuejin Xie

Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX

Add code
Apr 16, 2026
Viaarxiv icon

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

Add code
Apr 08, 2026
Viaarxiv icon

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Add code
Apr 08, 2026
Viaarxiv icon

ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety

Add code
Apr 02, 2026
Viaarxiv icon

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Add code
Mar 04, 2026
Viaarxiv icon

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Add code
Jan 26, 2026
Viaarxiv icon

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Add code
May 23, 2025
Viaarxiv icon

Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification

Add code
Jan 03, 2025
Figure 1 for Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification
Figure 2 for Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification
Figure 3 for Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification
Figure 4 for Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification
Viaarxiv icon