Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Netta Madvil

Grounded in Context: Retrieval-Based Method for Hallucination Detection

Apr 22, 2025

Assaf Gerner, Netta Madvil, Nadav Barak, Alex Zaikman, Jonatan Liberman, Liron Hamra, Rotem Brazilay, Shay Tsadok, Yaron Friedman, Neal Harow(+3 more)

Figure 1 for Grounded in Context: Retrieval-Based Method for Hallucination Detection

Figure 2 for Grounded in Context: Retrieval-Based Method for Hallucination Detection

Abstract:Despite advancements in grounded content generation, production Large Language Models (LLMs) based applications still suffer from hallucinated answers. We present "Grounded in Context" - Deepchecks' hallucination detection framework, designed for production-scale long-context data and tailored to diverse use cases, including summarization, data extraction, and RAG. Inspired by RAG architecture, our method integrates retrieval and Natural Language Inference (NLI) models to predict factual consistency between premises and hypotheses using an encoder-based model with only a 512-token context window. Our framework identifies unsupported claims with an F1 score of 0.83 in RAGTruth's response-level classification task, matching methods that trained on the dataset, and outperforming all comparable frameworks using similar-sized models.

Via

Access Paper or Ask Questions

Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

Jul 06, 2023

Netta Madvil, Yonatan Bitton, Roy Schwartz

Figure 1 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

Figure 2 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

Figure 3 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

Figure 4 for Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

Abstract:The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. Our method sheds light on the importance of different modalities in datasets, as well as the relationship between them. We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality. Moreover, we find that more than 70% of the questions are solvable using several different single-modality strategies, e.g., by either looking at the video or listening to the audio, highlighting the limited integration of multiple modalities in TVQA. We leverage our annotation and analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification. Based on our observations, we introduce a new test set that necessitates multiple modalities, observing a dramatic drop in model performance. Our methodology provides valuable insights into multimodal datasets and highlights the need for the development of more robust models.

Via

Access Paper or Ask Questions