Picture for Haoqin Tu

Haoqin Tu

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Add code
Apr 06, 2026
Viaarxiv icon

Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

Add code
Apr 02, 2026
Viaarxiv icon

Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation

Add code
Mar 17, 2026
Viaarxiv icon

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Add code
Mar 17, 2026
Viaarxiv icon

AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

Add code
Mar 05, 2026
Viaarxiv icon

SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

Add code
Nov 10, 2025
Viaarxiv icon

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Figure 1 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 2 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 3 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 4 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Viaarxiv icon

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Add code
May 07, 2025
Figure 1 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 2 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 3 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 4 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Viaarxiv icon

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Add code
Apr 10, 2025
Figure 1 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Figure 2 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Figure 3 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Figure 4 for SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Viaarxiv icon

STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

Add code
Apr 02, 2025
Viaarxiv icon