Picture for AJ Piergiovanni

AJ Piergiovanni

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Add code
Sep 16, 2022
Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

Pre-training image-language transformers for open-vocabulary tasks

Add code
Sep 09, 2022
Figure 1 for Pre-training image-language transformers for open-vocabulary tasks
Figure 2 for Pre-training image-language transformers for open-vocabulary tasks
Figure 3 for Pre-training image-language transformers for open-vocabulary tasks
Figure 4 for Pre-training image-language transformers for open-vocabulary tasks
Viaarxiv icon

Video Question Answering with Iterative Video-Text Co-Tokenization

Add code
Aug 01, 2022
Figure 1 for Video Question Answering with Iterative Video-Text Co-Tokenization
Figure 2 for Video Question Answering with Iterative Video-Text Co-Tokenization
Figure 3 for Video Question Answering with Iterative Video-Text Co-Tokenization
Figure 4 for Video Question Answering with Iterative Video-Text Co-Tokenization
Viaarxiv icon

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

Add code
May 02, 2022
Figure 1 for Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Figure 2 for Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Figure 3 for Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Figure 4 for Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Viaarxiv icon

FindIt: Generalized Localization with Natural Language Queries

Add code
Mar 31, 2022
Figure 1 for FindIt: Generalized Localization with Natural Language Queries
Figure 2 for FindIt: Generalized Localization with Natural Language Queries
Figure 3 for FindIt: Generalized Localization with Natural Language Queries
Figure 4 for FindIt: Generalized Localization with Natural Language Queries
Viaarxiv icon

4D-Net for Learned Multi-Modal Alignment

Add code
Sep 02, 2021
Figure 1 for 4D-Net for Learned Multi-Modal Alignment
Figure 2 for 4D-Net for Learned Multi-Modal Alignment
Figure 3 for 4D-Net for Learned Multi-Modal Alignment
Figure 4 for 4D-Net for Learned Multi-Modal Alignment
Viaarxiv icon

Unsupervised Discovery of Actions in Instructional Videos

Add code
Jun 28, 2021
Figure 1 for Unsupervised Discovery of Actions in Instructional Videos
Figure 2 for Unsupervised Discovery of Actions in Instructional Videos
Figure 3 for Unsupervised Discovery of Actions in Instructional Videos
Figure 4 for Unsupervised Discovery of Actions in Instructional Videos
Viaarxiv icon

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Add code
Jun 21, 2021
Figure 1 for TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Figure 2 for TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Figure 3 for TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Figure 4 for TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Viaarxiv icon

Unsupervised Action Segmentation for Instructional Videos

Add code
Jun 07, 2021
Figure 1 for Unsupervised Action Segmentation for Instructional Videos
Figure 2 for Unsupervised Action Segmentation for Instructional Videos
Figure 3 for Unsupervised Action Segmentation for Instructional Videos
Figure 4 for Unsupervised Action Segmentation for Instructional Videos
Viaarxiv icon

Adaptive Intermediate Representations for Video Understanding

Add code
Apr 14, 2021
Figure 1 for Adaptive Intermediate Representations for Video Understanding
Figure 2 for Adaptive Intermediate Representations for Video Understanding
Figure 3 for Adaptive Intermediate Representations for Video Understanding
Figure 4 for Adaptive Intermediate Representations for Video Understanding
Viaarxiv icon