Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

Jul 09, 2025

Liqiang Jing, Viet Lai, Seunghyun Yoon, Trung Bui, Xinya Du

Figure 1 for FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

Figure 2 for FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

Figure 3 for FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

Figure 4 for FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

Share this with someone who'll enjoy it:

Abstract:Video Multimodal Large Language Models (VideoMLLMs) have achieved remarkable progress in both Video-to-Text and Text-to-Video tasks. However, they often suffer fro hallucinations, generating content that contradicts the visual input. Existing evaluation methods are limited to one task (e.g., V2T) and also fail to assess hallucinations in open-ended, free-form responses. To address this gap, we propose FIFA, a unified FaIthFulness evAluation framework that extracts comprehensive descriptive facts, models their semantic dependencies via a Spatio-Temporal Semantic Dependency Graph, and verifies them using VideoQA models. We further introduce Post-Correction, a tool-based correction framework that revises hallucinated content. Extensive experiments demonstrate that FIFA aligns more closely with human judgment than existing evaluation methods, and that Post-Correction effectively improves factual consistency in both text and video generation.

View paper on

Share this with someone who'll enjoy it:

Title:FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

Paper and Code