Picture for Haotian Zhang

Haotian Zhang

WiFo-M$^2$: Plug-and-Play Multi-Modal Sensing via Foundation Model to Empower Wireless Communications

Add code
Jan 14, 2026
Viaarxiv icon

WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks

Add code
Jan 14, 2026
Viaarxiv icon

Look As You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning

Add code
Nov 15, 2025
Viaarxiv icon

Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation

Add code
Nov 13, 2025
Viaarxiv icon

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Add code
Nov 07, 2025
Viaarxiv icon

Tiny-WiFo: A Lightweight Wireless Foundation Model for Channel Prediction via Multi-Component Adaptive Knowledge Distillation

Add code
Nov 06, 2025
Viaarxiv icon

Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents

Add code
Sep 30, 2025
Figure 1 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Figure 2 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Figure 3 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Figure 4 for Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection

Add code
Sep 19, 2025
Figure 1 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Figure 2 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Figure 3 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Figure 4 for FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection
Viaarxiv icon

Scaling Learned Image Compression Models up to 1 Billion

Add code
Aug 12, 2025
Viaarxiv icon