Picture for Qianqi Yan

Qianqi Yan

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

Add code
Jul 17, 2025
Viaarxiv icon

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Add code
Feb 22, 2025
Viaarxiv icon

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

Add code
May 30, 2024
Viaarxiv icon

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

Add code
Jan 29, 2024
Figure 1 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 2 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 3 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 4 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Viaarxiv icon