Picture for Fanqing Meng

Fanqing Meng

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Add code
May 29, 2025
Viaarxiv icon

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Add code
May 19, 2025
Viaarxiv icon

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Add code
May 18, 2025
Viaarxiv icon

LangBridge: Interpreting Image as a Combination of Language Embeddings

Add code
Mar 26, 2025
Viaarxiv icon

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Add code
Mar 10, 2025
Viaarxiv icon

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Add code
Oct 07, 2024
Viaarxiv icon

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Add code
Aug 05, 2024
Figure 1 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 2 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 3 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Figure 4 for MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Viaarxiv icon

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Add code
Jun 17, 2024
Viaarxiv icon

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Add code
Jun 12, 2024
Figure 1 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 2 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 3 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 4 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Viaarxiv icon

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Add code
Apr 24, 2024
Figure 1 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 2 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 3 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 4 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Viaarxiv icon