Picture for Wenhai Wang

Wenhai Wang

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Add code
Apr 21, 2025
Viaarxiv icon

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

Add code
Apr 16, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting

Add code
Apr 02, 2025
Viaarxiv icon

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Add code
Mar 27, 2025
Viaarxiv icon

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Add code
Mar 13, 2025
Viaarxiv icon

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Add code
Mar 10, 2025
Viaarxiv icon

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Add code
Feb 25, 2025
Viaarxiv icon

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

Add code
Jan 21, 2025
Viaarxiv icon

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Add code
Dec 20, 2024
Viaarxiv icon