Picture for Antoine Yang

Antoine Yang

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

VidChapters-7M: Video Chapters at Scale

Add code
Sep 25, 2023
Figure 1 for VidChapters-7M: Video Chapters at Scale
Figure 2 for VidChapters-7M: Video Chapters at Scale
Figure 3 for VidChapters-7M: Video Chapters at Scale
Figure 4 for VidChapters-7M: Video Chapters at Scale
Viaarxiv icon

CoVR: Learning Composed Video Retrieval from Web Video Captions

Aug 28, 2023
Figure 1 for CoVR: Learning Composed Video Retrieval from Web Video Captions
Figure 2 for CoVR: Learning Composed Video Retrieval from Web Video Captions
Figure 3 for CoVR: Learning Composed Video Retrieval from Web Video Captions
Figure 4 for CoVR: Learning Composed Video Retrieval from Web Video Captions
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Mar 21, 2023
Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Add code
Jun 16, 2022
Figure 1 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 2 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 3 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Figure 4 for Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Viaarxiv icon

Learning to Answer Visual Questions from Web Videos

Add code
May 11, 2022
Figure 1 for Learning to Answer Visual Questions from Web Videos
Figure 2 for Learning to Answer Visual Questions from Web Videos
Figure 3 for Learning to Answer Visual Questions from Web Videos
Figure 4 for Learning to Answer Visual Questions from Web Videos
Viaarxiv icon

TubeDETR: Spatio-Temporal Video Grounding with Transformers

Add code
Mar 30, 2022
Figure 1 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Figure 2 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Figure 3 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Figure 4 for TubeDETR: Spatio-Temporal Video Grounding with Transformers
Viaarxiv icon

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Add code
Dec 01, 2020
Figure 1 for Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Figure 2 for Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Figure 3 for Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Figure 4 for Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Viaarxiv icon

NAS evaluation is frustratingly hard

Add code
Feb 13, 2020
Figure 1 for NAS evaluation is frustratingly hard
Figure 2 for NAS evaluation is frustratingly hard
Figure 3 for NAS evaluation is frustratingly hard
Figure 4 for NAS evaluation is frustratingly hard
Viaarxiv icon

MANAS: Multi-Agent Neural Architecture Search

Add code
Sep 05, 2019
Figure 1 for MANAS: Multi-Agent Neural Architecture Search
Figure 2 for MANAS: Multi-Agent Neural Architecture Search
Figure 3 for MANAS: Multi-Agent Neural Architecture Search
Figure 4 for MANAS: Multi-Agent Neural Architecture Search
Viaarxiv icon