Picture for Raiymbek Akshulakov

Raiymbek Akshulakov

From Unimodal to Multimodal: Scaling up Projectors to Align Modalities

Add code
Sep 28, 2024
Figure 1 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 2 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 3 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 4 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Viaarxiv icon

Do Vision and Language Encoders Represent the World Similarly?

Add code
Jan 10, 2024
Viaarxiv icon

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Add code
Aug 17, 2023
Viaarxiv icon