Video Question Answering


Video question answering (VideoQA) aims to answer natural language questions according to the given videos. Given a video and a question in natural language, the model produces accurate answers according to the content of the video.

Frame Sampling Strategies Matter: A Benchmark for small vision language models

Add code
Sep 18, 2025
Viaarxiv icon

Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark

Add code
Sep 17, 2025
Viaarxiv icon

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

Add code
Sep 17, 2025
Viaarxiv icon

In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting

Add code
Sep 09, 2025
Viaarxiv icon

MESH -- Understanding Videos Like Human: Measuring Hallucinations in Large Video Models

Add code
Sep 10, 2025
Viaarxiv icon

AdsQA: Towards Advertisement Video Understanding

Add code
Sep 10, 2025
Viaarxiv icon

ChainReaction! Structured Approach with Causal Chains as Intermediate Representations for Improved and Explainable Causal Video Question Answering

Add code
Aug 28, 2025
Viaarxiv icon

CVBench: Evaluating Cross-Video Synergies for Complex Multimodal Understanding and Reasoning

Add code
Aug 28, 2025
Viaarxiv icon

MovieCORE: COgnitive REasoning in Movies

Add code
Aug 26, 2025
Viaarxiv icon

See What You Need: Query-Aware Visual Intelligence through Reasoning-Perception Loops

Add code
Aug 25, 2025
Viaarxiv icon