Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shiho Matta

Which Way Does Time Flow? A Psychophysics-Grounded Evaluation for Vision-Language Models

Oct 30, 2025

Shiho Matta, Lis Kanashiro Pereira, Peitao Han, Fei Cheng, Shigeru Kitazawa

Abstract:Modern vision-language models (VLMs) excel at many multimodal tasks, yet their grasp of temporal information in video remains weak and, crucially, under-evaluated. We probe this gap with a deceptively simple but revealing challenge: judging the arrow of time (AoT)-whether a short clip is played forward or backward. We introduce AoT-PsyPhyBENCH, a psychophysically validated benchmark that tests whether VLMs can infer temporal direction in natural videos using the same stimuli and behavioral baselines established for humans. Our comprehensive evaluation of open-weight and proprietary, reasoning and non-reasoning VLMs reveals that most models perform near chance, and even the best lag far behind human accuracy on physically irreversible processes (e.g., free fall, diffusion/explosion) and causal manual actions (division/addition) that humans recognize almost instantly. These results highlight a fundamental gap in current multimodal systems: while they capture rich visual-semantic correlations, they lack the inductive biases required for temporal continuity and causal understanding. We release the code and data for AoT-PsyPhyBENCH to encourage further progress in the physical and temporal reasoning capabilities of VLMs.

* 10 pages

Via

Access Paper or Ask Questions

Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Oct 09, 2024

Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki

Figure 1 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Figure 2 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Figure 3 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Figure 4 for Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Abstract:Recent studies have demonstrated that few-shot learning allows LLMs to generate training data for supervised models at a low cost. However, the quality of LLM-generated data may not entirely match that of human-labeled data. This raises a crucial question: how should one balance the trade-off between the higher quality but more expensive human data and the lower quality yet substantially cheaper LLM-generated data? In this paper, we synthesized training data for conversational semantic frame analysis using GPT-4 and examined how to allocate budgets optimally to achieve the best performance. Our experiments, conducted across various budget levels, reveal that optimal cost-efficiency is achieved by combining both human and LLM-generated data across a wide range of budget levels. Notably, as the budget decreases, a higher proportion of LLM-generated data becomes more preferable.

* 12 pages including 4 pages of references and appendix. 7 figures

Via

Access Paper or Ask Questions