Picture for Xintong Zhang

Xintong Zhang

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Viaarxiv icon

On Domain-Specific Post-Training for Multimodal Large Language Models

Add code
Nov 29, 2024
Figure 1 for On Domain-Specific Post-Training for Multimodal Large Language Models
Figure 2 for On Domain-Specific Post-Training for Multimodal Large Language Models
Figure 3 for On Domain-Specific Post-Training for Multimodal Large Language Models
Figure 4 for On Domain-Specific Post-Training for Multimodal Large Language Models
Viaarxiv icon

CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update

Add code
Dec 18, 2023
Viaarxiv icon

Enhance Reasoning Ability of Visual-Language Models via Large Language Models

Add code
May 22, 2023
Viaarxiv icon