Factual Visual Question Answering


Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

Making medical vision-language models think causally across modalities with retrieval-augmented cross-modal reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models

Add code
Jan 27, 2026
Viaarxiv icon

V-Loop: Visual Logical Loop Verification for Hallucination Detection in Medical Visual Question Answering

Add code
Jan 26, 2026
Viaarxiv icon

MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation

Add code
Jan 21, 2026
Viaarxiv icon

VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding

Add code
Jan 12, 2026
Viaarxiv icon

MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark

Add code
Jan 05, 2026
Viaarxiv icon

PENDULUM: A Benchmark for Assessing Sycophancy in Multimodal Large Language Models

Add code
Dec 22, 2025
Viaarxiv icon

Improving VQA Reliability: A Dual-Assessment Approach with Self-Reflection and Cross-Model Verification

Add code
Dec 16, 2025
Viaarxiv icon

VEGAS: Mitigating Hallucinations in Large Vision-Language Models via Vision-Encoder Attention Guided Adaptive Steering

Add code
Dec 12, 2025
Viaarxiv icon