Picture for Raiymbek Akshulakov

Raiymbek Akshulakov

Do Vision and Language Encoders Represent the World Similarly?

Add code
Jan 10, 2024
Viaarxiv icon

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Add code
Aug 17, 2023
Viaarxiv icon