Factual Visual Question Answering


Personal AI Agent for Camera Roll VQA

Add code
Jun 03, 2026
Viaarxiv icon

Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams

Add code
Jun 02, 2026
Viaarxiv icon

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

Add code
Jun 01, 2026
Viaarxiv icon

Traceable Knowledge Graph Reasoning Enables LLM-Assisted Decision Support for Industrial VOCs in the Steel Industry

Add code
May 26, 2026
Viaarxiv icon

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Add code
May 20, 2026
Viaarxiv icon

FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

Add code
May 18, 2026
Viaarxiv icon

Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models

Add code
Apr 23, 2026
Viaarxiv icon

KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models

Add code
Mar 31, 2026
Viaarxiv icon

MTA-Agent: An Open Recipe for Multimodal Deep Search Agents

Add code
Apr 07, 2026
Viaarxiv icon

A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation

Add code
Apr 01, 2026
Viaarxiv icon