Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ryuhei Miyazato

EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors

Apr 03, 2026

Ryuhei Miyazato, Shunsuke Kitada, Kei Harada

Abstract:Vision-Language Models (VLMs) excel at multimodal tasks, but they remain vulnerable to hallucinations that are factually incorrect or ungrounded in the input image. Recent work suggests that hallucination detection using internal representations is more efficient and accurate than approaches that rely solely on model outputs. However, existing internal-representation-based methods typically rely on a single representation or detector, limiting their ability to capture diverse hallucination signals. In this paper, we propose EnsemHalDet, an ensemble-based hallucination detection framework that leverages multiple internal representations of VLMs, including attention outputs and hidden states. EnsemHalDet trains independent detectors for each representation and combines them through ensemble learning. Experimental results across multiple VQA datasets and VLMs show that EnsemHalDet consistently outperforms prior methods and single-detector models in terms of AUC. These results demonstrate that ensembling diverse internal signals significantly improves robustness in multimodal hallucination detection.

Via

Access Paper or Ask Questions

BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

Nov 09, 2025

Ryuhei Miyazato, Ting-Ruen Wei, Xuyang Wu, Hsin-Tai Wu, Kei Harada

Figure 1 for BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

Figure 2 for BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

Figure 3 for BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

Figure 4 for BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

Abstract:Aspect-based summarization aims to generate summaries that highlight specific aspects of a text, enabling more personalized and targeted summaries. However, its application to books remains unexplored due to the difficulty of constructing reference summaries for long text. To address this challenge, we propose BookAsSumQA, a QA-based evaluation framework for aspect-based book summarization. BookAsSumQA automatically generates aspect-specific QA pairs from a narrative knowledge graph to evaluate summary quality based on its question-answering performance. Our experiments using BookAsSumQA revealed that while LLM-based approaches showed higher accuracy on shorter texts, RAG-based methods become more effective as document length increases, making them more efficient and practical for aspect-based book summarization.

Via

Access Paper or Ask Questions

Ensembling Multiple Hallucination Detectors Trained on VLLM Internal Representations

Oct 16, 2025

Yuto Nakamizo, Ryuhei Miyazato, Hikaru Tanabe, Ryuta Yamakura, Kiori Hatanaka

Abstract:This paper presents the 5th place solution by our team, y3h2, for the Meta CRAG-MM Challenge at KDD Cup 2025. The CRAG-MM benchmark is a visual question answering (VQA) dataset focused on factual questions about images, including egocentric images. The competition was contested based on VQA accuracy, as judged by an LLM-based automatic evaluator. Since incorrect answers result in negative scores, our strategy focused on reducing hallucinations from the internal representations of the VLM. Specifically, we trained logistic regression-based hallucination detection models using both the hidden_state and the outputs of specific attention heads. We then employed an ensemble of these models. As a result, while our method sacrificed some correct answers, it significantly reduced hallucinations and allowed us to place among the top entries on the final leaderboard. For implementation details and code, please refer to https://gitlab.aicrowd.com/htanabe/meta-comprehensive-rag-benchmark-starter-kit.

* 5th place solution at Meta KDD Cup 2025

Via

Access Paper or Ask Questions