Picture for Sankalp Nagaonkar

Sankalp Nagaonkar

Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding

Add code
Apr 13, 2026
Viaarxiv icon

Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments

Add code
Feb 10, 2025
Figure 1 for Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments
Figure 2 for Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments
Figure 3 for Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments
Figure 4 for Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments
Viaarxiv icon

BadScan: An Architectural Backdoor Attack on Visual State Space Models

Add code
Nov 26, 2024
Viaarxiv icon

TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering

Add code
Jul 16, 2024
Viaarxiv icon