Picture for Haiyang Xu

Haiyang Xu

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Add code
Jun 05, 2025
Viaarxiv icon

VLM-R$^3$: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Add code
May 22, 2025
Viaarxiv icon

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

Add code
May 21, 2025
Viaarxiv icon

Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

Add code
May 09, 2025
Viaarxiv icon

Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning

Add code
May 01, 2025
Viaarxiv icon

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Add code
Apr 17, 2025
Viaarxiv icon

Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration

Add code
Feb 25, 2025
Viaarxiv icon

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Add code
Feb 21, 2025
Viaarxiv icon

Megrez-Omni Technical Report

Add code
Feb 19, 2025
Viaarxiv icon

Qwen2.5-VL Technical Report

Add code
Feb 19, 2025
Viaarxiv icon