Picture for Lingzhong Dong

Lingzhong Dong

Agent-ScanKit: Unraveling Memory and Reasoning of Multimodal Agents via Sensitivity Perturbations

Add code
Oct 02, 2025
Viaarxiv icon

Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

Add code
Oct 02, 2025
Viaarxiv icon

GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents

Add code
May 19, 2025
Viaarxiv icon

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

Add code
Jan 18, 2024
Figure 1 for R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Figure 2 for R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Figure 3 for R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Figure 4 for R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Viaarxiv icon