Picture for Jordi Pont-Tuset

Jordi Pont-Tuset

Evaluating Numerical Reasoning in Text-to-Image Models

Add code
Jun 20, 2024
Figure 1 for Evaluating Numerical Reasoning in Text-to-Image Models
Figure 2 for Evaluating Numerical Reasoning in Text-to-Image Models
Figure 3 for Evaluating Numerical Reasoning in Text-to-Image Models
Figure 4 for Evaluating Numerical Reasoning in Text-to-Image Models
Viaarxiv icon

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Add code
May 27, 2024
Viaarxiv icon

DOCCI: Descriptions of Connected and Contrasting Images

Add code
Apr 30, 2024
Figure 1 for DOCCI: Descriptions of Connected and Contrasting Images
Figure 2 for DOCCI: Descriptions of Connected and Contrasting Images
Figure 3 for DOCCI: Descriptions of Connected and Contrasting Images
Figure 4 for DOCCI: Descriptions of Connected and Contrasting Images
Viaarxiv icon

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Add code
Apr 25, 2024
Figure 1 for Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Figure 2 for Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Figure 3 for Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Figure 4 for Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Rich Human Feedback for Text-to-Image Generation

Add code
Dec 15, 2023
Figure 1 for Rich Human Feedback for Text-to-Image Generation
Figure 2 for Rich Human Feedback for Text-to-Image Generation
Figure 3 for Rich Human Feedback for Text-to-Image Generation
Figure 4 for Rich Human Feedback for Text-to-Image Generation
Viaarxiv icon

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

Add code
Oct 30, 2023
Figure 1 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Figure 2 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Figure 3 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Figure 4 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Viaarxiv icon

EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023

Add code
Jun 29, 2023
Figure 1 for EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023
Figure 2 for EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023
Figure 3 for EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023
Figure 4 for EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Mar 21, 2023
Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon

Connecting Vision and Language with Video Localized Narratives

Add code
Mar 15, 2023
Figure 1 for Connecting Vision and Language with Video Localized Narratives
Figure 2 for Connecting Vision and Language with Video Localized Narratives
Figure 3 for Connecting Vision and Language with Video Localized Narratives
Figure 4 for Connecting Vision and Language with Video Localized Narratives
Viaarxiv icon