Picture for Wenhai Wang

Wenhai Wang

CoMemo: LVLMs Need Image Context with Image Memory

Add code
Jun 06, 2025
Viaarxiv icon

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

Add code
Jun 04, 2025
Viaarxiv icon

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Add code
May 29, 2025
Viaarxiv icon

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

Add code
May 29, 2025
Viaarxiv icon

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models

Add code
May 28, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Viaarxiv icon

Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity

Add code
May 17, 2025
Viaarxiv icon

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

Add code
May 07, 2025
Viaarxiv icon

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Add code
Apr 21, 2025
Viaarxiv icon

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

Add code
Apr 16, 2025
Viaarxiv icon