Picture for Antoine Miech

Antoine Miech

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Add code
Dec 12, 2023
Viaarxiv icon

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Add code
May 23, 2023
Figure 1 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 2 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 3 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 4 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viaarxiv icon

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

Add code
May 03, 2023
Figure 1 for Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Figure 2 for Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Figure 3 for Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Figure 4 for Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Mar 21, 2023
Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon

Zorro: the masked multimodal transformer

Add code
Jan 23, 2023
Figure 1 for Zorro: the masked multimodal transformer
Figure 2 for Zorro: the masked multimodal transformer
Figure 3 for Zorro: the masked multimodal transformer
Figure 4 for Zorro: the masked multimodal transformer
Viaarxiv icon

Multi-Task Learning of Object State Changes from Uncurated Videos

Add code
Nov 24, 2022
Figure 1 for Multi-Task Learning of Object State Changes from Uncurated Videos
Figure 2 for Multi-Task Learning of Object State Changes from Uncurated Videos
Figure 3 for Multi-Task Learning of Object State Changes from Uncurated Videos
Figure 4 for Multi-Task Learning of Object State Changes from Uncurated Videos
Viaarxiv icon

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Add code
Jun 16, 2022
Figure 1 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 2 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 3 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 4 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Viaarxiv icon

Learning to Answer Visual Questions from Web Videos

Add code
May 11, 2022
Figure 1 for Learning to Answer Visual Questions from Web Videos
Figure 2 for Learning to Answer Visual Questions from Web Videos
Figure 3 for Learning to Answer Visual Questions from Web Videos
Figure 4 for Learning to Answer Visual Questions from Web Videos
Viaarxiv icon