Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox


Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference

Jun 27, 2021
Riko Suzuki, Hitomi Yanaka, Koji Mineshima, Daisuke Bekki


Share this with someone who'll enjoy it:


This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action labels, and 1,942 action triplets of the form that can be translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

* Accepted to MMSR I 


   Access Paper Source



Share this with someone who'll enjoy it: