Picture for Haotian Zhang

Haotian Zhang

Look As You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning

Add code
Nov 15, 2025
Viaarxiv icon

Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation

Add code
Nov 13, 2025
Viaarxiv icon

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Add code
Nov 07, 2025
Viaarxiv icon

Tiny-WiFo: A Lightweight Wireless Foundation Model for Channel Prediction via Multi-Component Adaptive Knowledge Distillation

Add code
Nov 06, 2025
Viaarxiv icon

Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents

Add code
Sep 30, 2025
Figure 1 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Figure 2 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Figure 3 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Figure 4 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection

Add code
Sep 19, 2025
Figure 1 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Figure 2 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Figure 3 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Figure 4 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Viaarxiv icon

Scaling Learned Image Compression Models up to 1 Billion

Add code
Aug 12, 2025
Viaarxiv icon

LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning

Add code
Jun 14, 2025
Viaarxiv icon

Synesthesia of Machines (SoM)-Aided Online FDD Precoding via Heterogeneous Multi-Modal Sensing: A Vertical Federated Learning Approach

Add code
Jun 09, 2025
Viaarxiv icon