Picture for Haiyang Xu

Haiyang Xu

VideoNSA: Native Sparse Attention Scales Video Understanding

Add code
Oct 02, 2025
Viaarxiv icon

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Add code
Aug 21, 2025
Viaarxiv icon

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Viaarxiv icon

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Add code
Jul 30, 2025
Viaarxiv icon

Megrez2 Technical Report

Add code
Jul 23, 2025
Viaarxiv icon

Perception-Aware Policy Optimization for Multimodal Reasoning

Add code
Jul 08, 2025
Viaarxiv icon

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Add code
Jun 05, 2025
Viaarxiv icon

VLM-R$^3$: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Add code
May 22, 2025
Viaarxiv icon

Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation

Add code
May 21, 2025
Viaarxiv icon

Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

Add code
May 09, 2025
Viaarxiv icon