Picture for Haoqin Tu

Haoqin Tu

Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation

Add code
Mar 17, 2026
Viaarxiv icon

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Add code
Mar 17, 2026
Viaarxiv icon

AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

Add code
Mar 05, 2026
Viaarxiv icon

SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

Add code
Nov 10, 2025
Viaarxiv icon

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Figure 1 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 2 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 3 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 4 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Viaarxiv icon

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Add code
May 07, 2025
Figure 1 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 2 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 3 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 4 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Viaarxiv icon

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Add code
Apr 10, 2025
Figure 1 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Figure 2 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Figure 3 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Figure 4 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Viaarxiv icon

STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

Add code
Apr 02, 2025
Viaarxiv icon

ViLBench: A Suite for Vision-Language Process Reward Modeling

Add code
Mar 26, 2025
Viaarxiv icon

Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning

Add code
Feb 17, 2025
Figure 1 for Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Figure 2 for Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Figure 3 for Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Figure 4 for Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Viaarxiv icon