Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

May 24, 2025

Jiayu Wang, Yang Jiao, Yue Yu, Tianwen Qian, Shaoxiang Chen, Jingjing Chen, Yu-Gang Jiang

Figure 1 for OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

Figure 2 for OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

Figure 3 for OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

Figure 4 for OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

Share this with someone who'll enjoy it:

Abstract:Recent breakthroughs in large multimodal models (LMMs), such as the impressive GPT-4o-Native, have demonstrated remarkable proficiency in following general-purpose instructions for image generation. However, current benchmarks often lack the necessary breadth and depth to fully evaluate the diverse capabilities of these models. To overcome this limitation, we introduce OmniGenBench, a novel and comprehensive benchmark meticulously designed to assess the instruction-following abilities of state-of-the-art LMMs across both perception-centric and cognition-centric dimensions. Our OmniGenBench includes 57 diverse sub-tasks grounded in real-world scenarios, systematically categorized according to the specific model capabilities they demand. For rigorous evaluation, we further employ a dual-mode protocol. This protocol utilizes off-the-shelf visual parsing tools for perception-centric tasks and a powerful LLM-based judger for cognition-centric tasks to assess the alignment between generated images and user instructions. Using OmniGenBench, we evaluate mainstream generative models, including prevalent models like GPT-4o, Gemini-2.0-Flash, and Seedream, and provide in-depth comparisons and analyses of their performance.Code and data are available at https://github.com/emilia113/OmniGenBench.

View paper on

Share this with someone who'll enjoy it:

Title:OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks

Paper and Code