Picture for Handong Zhao

Handong Zhao

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

Add code
Feb 27, 2026
Viaarxiv icon

Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

Add code
Feb 24, 2026
Viaarxiv icon

RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

Add code
Feb 19, 2026
Viaarxiv icon

More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

Add code
Dec 13, 2025
Viaarxiv icon

Rethinking the Text-Vision Reasoning Imbalance in MLLMs through the Lens of Training Recipes

Add code
Oct 26, 2025
Viaarxiv icon

Interactive Visualization Recommendation with Hier-SUCB

Add code
Feb 06, 2025
Viaarxiv icon

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration

Add code
Jan 27, 2025
Figure 1 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Figure 2 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Figure 3 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Figure 4 for GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Viaarxiv icon

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Add code
Jan 15, 2025
Figure 1 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 2 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 3 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 4 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Viaarxiv icon

DynaSaur: Large Language Agents Beyond Predefined Actions

Add code
Nov 04, 2024
Figure 1 for DynaSaur: Large Language Agents Beyond Predefined Actions
Figure 2 for DynaSaur: Large Language Agents Beyond Predefined Actions
Figure 3 for DynaSaur: Large Language Agents Beyond Predefined Actions
Figure 4 for DynaSaur: Large Language Agents Beyond Predefined Actions
Viaarxiv icon

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Add code
Jul 02, 2024
Figure 1 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Figure 2 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Figure 3 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Figure 4 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Viaarxiv icon