Video Question Answering


Video question answering (VideoQA) aims to answer natural language questions according to the given videos. Given a video and a question in natural language, the model produces accurate answers according to the content of the video.

CogStream: Context-guided Streaming Video Question Answering

Add code
Jun 12, 2025
Viaarxiv icon

Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question Answering

Add code
Jun 12, 2025
Viaarxiv icon

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision

Add code
Jun 11, 2025
Viaarxiv icon

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Add code
Jun 12, 2025
Viaarxiv icon

CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models

Add code
Jun 11, 2025
Viaarxiv icon

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

Add code
Jun 12, 2025
Viaarxiv icon

Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos

Add code
Jun 11, 2025
Viaarxiv icon

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Add code
Jun 11, 2025
Viaarxiv icon

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

Add code
Jun 11, 2025
Viaarxiv icon

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Viaarxiv icon