Picture for AJ Piergiovanni

AJ Piergiovanni

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Add code
Nov 13, 2023
Figure 1 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 2 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 3 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 4 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Viaarxiv icon

Diversifying Joint Vision-Language Tokenization Learning

Add code
Jun 15, 2023
Figure 1 for Diversifying Joint Vision-Language Tokenization Learning
Figure 2 for Diversifying Joint Vision-Language Tokenization Learning
Figure 3 for Diversifying Joint Vision-Language Tokenization Learning
Figure 4 for Diversifying Joint Vision-Language Tokenization Learning
Viaarxiv icon

Joint Adaptive Representations for Image-Language Learning

Add code
Jun 01, 2023
Figure 1 for Joint Adaptive Representations for Image-Language Learning
Figure 2 for Joint Adaptive Representations for Image-Language Learning
Figure 3 for Joint Adaptive Representations for Image-Language Learning
Figure 4 for Joint Adaptive Representations for Image-Language Learning
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Add code
Mar 30, 2023
Figure 1 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 2 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 3 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 4 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Viaarxiv icon

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

Add code
Dec 06, 2022
Figure 1 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 2 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 3 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 4 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Viaarxiv icon

Compound Tokens: Channel Fusion for Vision-Language Representation Learning

Add code
Dec 02, 2022
Figure 1 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Figure 2 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Figure 3 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Figure 4 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Viaarxiv icon

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Add code
Sep 30, 2022
Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Add code
Sep 16, 2022
Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

Pre-training image-language transformers for open-vocabulary tasks

Add code
Sep 09, 2022
Figure 1 for Pre-training image-language transformers for open-vocabulary tasks
Figure 2 for Pre-training image-language transformers for open-vocabulary tasks
Figure 3 for Pre-training image-language transformers for open-vocabulary tasks
Figure 4 for Pre-training image-language transformers for open-vocabulary tasks
Viaarxiv icon