Picture for Soravit Changpinyo

Soravit Changpinyo

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

What You See is What You Read? Improving Text-Image Alignment Evaluation

Add code
May 22, 2023
Figure 1 for What You See is What You Read? Improving Text-Image Alignment Evaluation
Figure 2 for What You See is What You Read? Improving Text-Image Alignment Evaluation
Figure 3 for What You See is What You Read? Improving Text-Image Alignment Evaluation
Figure 4 for What You See is What You Read? Improving Text-Image Alignment Evaluation
Viaarxiv icon

Connecting Vision and Language with Video Localized Narratives

Add code
Mar 15, 2023
Figure 1 for Connecting Vision and Language with Video Localized Narratives
Figure 2 for Connecting Vision and Language with Video Localized Narratives
Figure 3 for Connecting Vision and Language with Video Localized Narratives
Figure 4 for Connecting Vision and Language with Video Localized Narratives
Viaarxiv icon

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Add code
Feb 24, 2023
Figure 1 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Figure 2 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Figure 3 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Figure 4 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Viaarxiv icon

MetaCLUE: Towards Comprehensive Visual Metaphors Research

Add code
Dec 19, 2022
Figure 1 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 2 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 3 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 4 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Add code
Sep 16, 2022
Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

PreSTU: Pre-Training for Scene-Text Understanding

Add code
Sep 12, 2022
Figure 1 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 2 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 3 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 4 for PreSTU: Pre-Training for Scene-Text Understanding
Viaarxiv icon

Towards Multi-Lingual Visual Question Answering

Add code
Sep 12, 2022
Figure 1 for Towards Multi-Lingual Visual Question Answering
Figure 2 for Towards Multi-Lingual Visual Question Answering
Figure 3 for Towards Multi-Lingual Visual Question Answering
Figure 4 for Towards Multi-Lingual Visual Question Answering
Viaarxiv icon

All You May Need for VQA are Image Captions

Add code
May 04, 2022
Figure 1 for All You May Need for VQA are Image Captions
Figure 2 for All You May Need for VQA are Image Captions
Figure 3 for All You May Need for VQA are Image Captions
Figure 4 for All You May Need for VQA are Image Captions
Viaarxiv icon