Alert button
Picture for AJ Piergiovanni

AJ Piergiovanni

Alert button

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Add code
Bookmark button
Alert button
Nov 13, 2023
AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

Figure 1 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 2 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 3 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 4 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Viaarxiv icon

Diversifying Joint Vision-Language Tokenization Learning

Add code
Bookmark button
Alert button
Jun 15, 2023
Vardaan Pahuja, AJ Piergiovanni, Anelia Angelova

Figure 1 for Diversifying Joint Vision-Language Tokenization Learning
Figure 2 for Diversifying Joint Vision-Language Tokenization Learning
Figure 3 for Diversifying Joint Vision-Language Tokenization Learning
Figure 4 for Diversifying Joint Vision-Language Tokenization Learning
Viaarxiv icon

Joint Adaptive Representations for Image-Language Learning

Add code
Bookmark button
Alert button
Jun 01, 2023
AJ Piergiovanni, Anelia Angelova

Figure 1 for Joint Adaptive Representations for Image-Language Learning
Figure 2 for Joint Adaptive Representations for Image-Language Learning
Figure 3 for Joint Adaptive Representations for Image-Language Learning
Figure 4 for Joint Adaptive Representations for Image-Language Learning
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
Bookmark button
Alert button
May 29, 2023
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Add code
Bookmark button
Alert button
Mar 30, 2023
Weicheng Kuo, AJ Piergiovanni, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova

Figure 1 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 2 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 3 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Figure 4 for MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Viaarxiv icon

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

Add code
Bookmark button
Alert button
Dec 06, 2022
AJ Piergiovanni, Weicheng Kuo, Anelia Angelova

Figure 1 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 2 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 3 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Figure 4 for Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
Viaarxiv icon

Compound Tokens: Channel Fusion for Vision-Language Representation Learning

Add code
Bookmark button
Alert button
Dec 02, 2022
Maxwell Mbabilla Aladago, AJ Piergiovanni

Figure 1 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Figure 2 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Figure 3 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Figure 4 for Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Viaarxiv icon

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Add code
Bookmark button
Alert button
Sep 30, 2022
Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova

Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Add code
Bookmark button
Alert button
Sep 16, 2022
Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

Pre-training image-language transformers for open-vocabulary tasks

Add code
Bookmark button
Alert button
Sep 09, 2022
AJ Piergiovanni, Weicheng Kuo, Anelia Angelova

Figure 1 for Pre-training image-language transformers for open-vocabulary tasks
Figure 2 for Pre-training image-language transformers for open-vocabulary tasks
Figure 3 for Pre-training image-language transformers for open-vocabulary tasks
Figure 4 for Pre-training image-language transformers for open-vocabulary tasks
Viaarxiv icon