Picture for Wenqi Shao

Wenqi Shao

Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images

Add code
Jun 09, 2025
Viaarxiv icon

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

Add code
Jun 04, 2025
Viaarxiv icon

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Add code
May 19, 2025
Viaarxiv icon

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Add code
May 18, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Add code
Apr 02, 2025
Viaarxiv icon

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Add code
Mar 19, 2025
Viaarxiv icon

Car-1000: A New Large Scale Fine-Grained Visual Categorization Dataset

Add code
Mar 16, 2025
Viaarxiv icon

PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models

Add code
Mar 16, 2025
Viaarxiv icon