Picture for Hang Hua

Hang Hua

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Add code
May 18, 2026
Viaarxiv icon

Aurora: Unified Video Editing with a Tool-Using Agent

Add code
May 18, 2026
Viaarxiv icon

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Add code
May 12, 2026
Viaarxiv icon

Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs

Add code
Apr 08, 2026
Viaarxiv icon

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

Add code
Mar 28, 2026
Viaarxiv icon

SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs

Add code
Feb 06, 2026
Viaarxiv icon

DAVE: A VLM Vision Encoder for Document Understanding and Web Agents

Add code
Dec 19, 2025
Viaarxiv icon

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Add code
Oct 06, 2025
Figure 1 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 2 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 3 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 4 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Viaarxiv icon

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Add code
May 26, 2025
Viaarxiv icon

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Add code
May 26, 2025
Viaarxiv icon