Picture for Haiyang Xu

Haiyang Xu

AgentOCR: Reimagining Agent History via Optical Self-Compression

Add code
Jan 08, 2026
Viaarxiv icon

CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning

Add code
Dec 09, 2025
Viaarxiv icon

MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns

Add code
Nov 16, 2025
Viaarxiv icon

Efficient and Effective In-context Demonstration Selection with Coreset

Add code
Nov 12, 2025
Viaarxiv icon

Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters

Add code
Nov 06, 2025
Figure 1 for Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters
Figure 2 for Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters
Figure 3 for Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters
Figure 4 for Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters
Viaarxiv icon

VideoNSA: Native Sparse Attention Scales Video Understanding

Add code
Oct 02, 2025
Figure 1 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 2 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 3 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 4 for VideoNSA: Native Sparse Attention Scales Video Understanding
Viaarxiv icon

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Add code
Aug 21, 2025
Figure 1 for Mobile-Agent-v3: Foundamental Agents for GUI Automation
Figure 2 for Mobile-Agent-v3: Foundamental Agents for GUI Automation
Figure 3 for Mobile-Agent-v3: Foundamental Agents for GUI Automation
Figure 4 for Mobile-Agent-v3: Foundamental Agents for GUI Automation
Viaarxiv icon

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Figure 1 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 2 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 3 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 4 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Viaarxiv icon

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Add code
Jul 30, 2025
Viaarxiv icon

Megrez2 Technical Report

Add code
Jul 23, 2025
Viaarxiv icon