Alert button
Picture for Haotian Zhang

Haotian Zhang

Alert button

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Bookmark button
Alert button
Apr 11, 2024
Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

Viaarxiv icon

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Add code
Bookmark button
Alert button
Apr 08, 2024
Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

Viaarxiv icon

Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis

Add code
Bookmark button
Alert button
Apr 01, 2024
Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi

Figure 1 for Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
Figure 2 for Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
Figure 3 for Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
Figure 4 for Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
Viaarxiv icon

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Add code
Bookmark button
Alert button
Mar 22, 2024
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang

Figure 1 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 2 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 3 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Figure 4 for MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Viaarxiv icon

Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning

Add code
Bookmark button
Alert button
Mar 16, 2024
Jizhe Dou, Haotian Zhang, Guodong Sun

Figure 1 for Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning
Figure 2 for Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning
Figure 3 for Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning
Figure 4 for Scheduling Drone and Mobile Charger via Hybrid-Action Deep Reinforcement Learning
Viaarxiv icon

RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

Add code
Bookmark button
Alert button
Mar 12, 2024
Mingze Wang, Keyan Chen, Lili Su, Cilin Yan, Sheng Xu, Haotian Zhang, Pengcheng Yuan, Xiaolong Jiang, Baochang Zhang

Figure 1 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model
Figure 2 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model
Figure 3 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model
Figure 4 for RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model
Viaarxiv icon

Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

Add code
Bookmark button
Alert button
Mar 09, 2024
Cunhui Dong, Haichuan Ma, Haotian Zhang, Changsheng Gao, Li Li, Dong Liu

Figure 1 for Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding
Figure 2 for Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding
Figure 3 for Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding
Figure 4 for Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding
Viaarxiv icon

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Add code
Bookmark button
Alert button
Feb 20, 2024
Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan

Viaarxiv icon