Factual Visual Question Answering


Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models

Add code
Mar 09, 2026
Viaarxiv icon

Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing

Add code
Mar 03, 2026
Viaarxiv icon

Multimodal Adaptive Retrieval Augmented Generation through Internal Representation Learning

Add code
Feb 28, 2026
Viaarxiv icon

NICO-RAG: Multimodal Hypergraph Retrieval-Augmented Generation for Understanding the Nicotine Public Health Crisis

Add code
Mar 02, 2026
Viaarxiv icon

Making medical vision-language models think causally across modalities with retrieval-augmented cross-modal reasoning

Add code
Jan 26, 2026
Viaarxiv icon

V-Loop: Visual Logical Loop Verification for Hallucination Detection in Medical Visual Question Answering

Add code
Jan 26, 2026
Viaarxiv icon

Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models

Add code
Jan 27, 2026
Viaarxiv icon

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding

Add code
Jan 12, 2026
Viaarxiv icon

MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation

Add code
Jan 21, 2026
Viaarxiv icon