Picture for Wenqi Shao

Wenqi Shao

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams

Add code
Aug 09, 2025
Viaarxiv icon

Cardiac-CLIP: A Vision-Language Foundation Model for 3D Cardiac CT Images

Add code
Jul 29, 2025
Viaarxiv icon

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Add code
Jul 24, 2025
Viaarxiv icon

Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images

Add code
Jun 09, 2025
Viaarxiv icon

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

Add code
Jun 04, 2025
Viaarxiv icon

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Add code
May 19, 2025
Viaarxiv icon

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Add code
May 18, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

Add code
Apr 02, 2025
Viaarxiv icon