Picture for Haiyang Xu

Haiyang Xu

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Add code
Aug 21, 2025
Viaarxiv icon

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Viaarxiv icon

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Add code
Jul 30, 2025
Viaarxiv icon

Megrez2 Technical Report

Add code
Jul 23, 2025
Viaarxiv icon

Perception-Aware Policy Optimization for Multimodal Reasoning

Add code
Jul 08, 2025
Viaarxiv icon

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Add code
Jun 05, 2025
Viaarxiv icon

VLM-R$^3$: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Add code
May 22, 2025
Viaarxiv icon

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

Add code
May 21, 2025
Viaarxiv icon

Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

Add code
May 09, 2025
Viaarxiv icon

Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning

Add code
May 01, 2025
Viaarxiv icon