Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinkun Zhao

What Color Is It? A Text-Interference Multimodal Hallucination Benchmark

Nov 19, 2025

Jinkun Zhao, Lei Huang, Haixin Ge, Wenjun Wu

Figure 1 for What Color Is It? A Text-Interference Multimodal Hallucination Benchmark

Figure 2 for What Color Is It? A Text-Interference Multimodal Hallucination Benchmark

Figure 3 for What Color Is It? A Text-Interference Multimodal Hallucination Benchmark

Figure 4 for What Color Is It? A Text-Interference Multimodal Hallucination Benchmark

Abstract:With the rapid advancement of Large Models, numerous text-and-vision-fused Multimodal Large Models (MLMs) have emerged. However, these MLMs remain susceptible to informational interference in visual perception, particularly in color perception, which introduces an additional risk of hallucination. To validate this hypothesis, we introduce the "What Color Is It" dataset, a novel benchmark constructed using a simple method to trigger single-modality visual hallucination in MLMs. Based on this dataset, we further investigate the underlying causes of hallucination in the visual modality of MLMs and propose potential solutions to enhance their robustness.

Via

Access Paper or Ask Questions

Bench-CoE: a Framework for Collaboration of Experts from Benchmark

Dec 05, 2024

Yuanshuai Wang, Xingjian Zhang, Jinkun Zhao, Siwei Wen, Peilin Feng, Shuhao Liao, Lei Huang, Wenjun Wu

Abstract:Large Language Models (LLMs) are key technologies driving intelligent systems to handle multiple tasks. To meet the demands of various tasks, an increasing number of LLMs-driven experts with diverse capabilities have been developed, accompanied by corresponding benchmarks to evaluate their performance. This paper proposes the Bench-CoE framework, which enables Collaboration of Experts (CoE) by effectively leveraging benchmark evaluations to achieve optimal performance across various tasks. Bench-CoE includes a set of expert models, a router for assigning tasks to corresponding experts, and a benchmark dataset for training the router. Moreover, we formulate Query-Level and Subject-Level approaches based on our framework, and analyze the merits and drawbacks of these two approaches. Finally, we conduct a series of experiments with vary data distributions on both language and multimodal tasks to validate that our proposed Bench-CoE outperforms any single model in terms of overall performance. We hope this method serves as a baseline for further research in this area. The code is available at \url{https://github.com/ZhangXJ199/Bench-CoE}.

* The code is available at \url{https://github.com/ZhangXJ199/Bench-CoE}

Via

Access Paper or Ask Questions