Picture for Haotian Zhang

Haotian Zhang

Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents

Add code
Sep 30, 2025
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Viaarxiv icon

FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection

Add code
Sep 19, 2025
Viaarxiv icon

Scaling Learned Image Compression Models up to 1 Billion

Add code
Aug 12, 2025
Viaarxiv icon

LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning

Add code
Jun 14, 2025
Viaarxiv icon

Synesthesia of Machines (SoM)-Aided Online FDD Precoding via Heterogeneous Multi-Modal Sensing: A Vertical Federated Learning Approach

Add code
Jun 09, 2025
Viaarxiv icon

SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation

Add code
Jun 06, 2025
Viaarxiv icon

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Add code
May 27, 2025
Figure 1 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 2 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 3 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 4 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Viaarxiv icon

GENMO: A GENeralist Model for Human MOtion

Add code
May 02, 2025
Viaarxiv icon

The Fourth Monocular Depth Estimation Challenge

Add code
Apr 24, 2025
Viaarxiv icon