Visual Perception Viper


VIPER Strike: Defeating Visual Reasoning CAPTCHAs via Structured Vision-Language Inference

Add code
Jan 10, 2026
Viaarxiv icon

Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

Add code
Aug 24, 2025
Figure 1 for Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Figure 2 for Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Figure 3 for Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Figure 4 for Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Viaarxiv icon

VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making

Add code
Mar 19, 2025
Viaarxiv icon

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

Add code
Jun 16, 2024
Figure 1 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 2 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 3 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Figure 4 for Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Viaarxiv icon