Picture for Weiyun Wang

Weiyun Wang

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

Add code
Oct 14, 2025
Viaarxiv icon

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Add code
Sep 18, 2025
Figure 1 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 2 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 3 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 4 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Viaarxiv icon

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

Add code
Jun 04, 2025
Viaarxiv icon

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought

Add code
May 21, 2025
Viaarxiv icon

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Add code
Apr 21, 2025
Figure 1 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 2 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 3 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Figure 4 for VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Add code
Mar 13, 2025
Figure 1 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Figure 2 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Figure 3 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Figure 4 for VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Viaarxiv icon

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Add code
Feb 25, 2025
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon