Picture for Vittorio Ferrari

Vittorio Ferrari

MaskInversion: Localized Embeddings via Optimization of Explainability Maps

Add code
Jul 29, 2024
Viaarxiv icon

HAMMR: HierArchical MultiModal React agents for generic VQA

Add code
Apr 08, 2024
Figure 1 for HAMMR: HierArchical MultiModal React agents for generic VQA
Figure 2 for HAMMR: HierArchical MultiModal React agents for generic VQA
Figure 3 for HAMMR: HierArchical MultiModal React agents for generic VQA
Figure 4 for HAMMR: HierArchical MultiModal React agents for generic VQA
Viaarxiv icon

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Add code
Dec 05, 2023
Figure 1 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Figure 2 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Figure 3 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Figure 4 for Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Viaarxiv icon

StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

Add code
Aug 22, 2023
Figure 1 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Figure 2 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Figure 3 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Figure 4 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Viaarxiv icon

Estimating Generic 3D Room Structures from 2D Annotations

Add code
Jun 15, 2023
Figure 1 for Estimating Generic 3D Room Structures from 2D Annotations
Figure 2 for Estimating Generic 3D Room Structures from 2D Annotations
Figure 3 for Estimating Generic 3D Room Structures from 2D Annotations
Figure 4 for Estimating Generic 3D Room Structures from 2D Annotations
Viaarxiv icon

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

Add code
Jun 15, 2023
Viaarxiv icon

NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations

Add code
Jun 15, 2023
Figure 1 for NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Figure 2 for NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Figure 3 for NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Figure 4 for NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Viaarxiv icon

CAD-Estate: Large-scale CAD Model Annotation in RGB Videos

Add code
Jun 15, 2023
Figure 1 for CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
Figure 2 for CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
Figure 3 for CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
Figure 4 for CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
Viaarxiv icon

Tracking by 3D Model Estimation of Unknown Objects in Videos

Add code
Apr 13, 2023
Figure 1 for Tracking by 3D Model Estimation of Unknown Objects in Videos
Figure 2 for Tracking by 3D Model Estimation of Unknown Objects in Videos
Figure 3 for Tracking by 3D Model Estimation of Unknown Objects in Videos
Figure 4 for Tracking by 3D Model Estimation of Unknown Objects in Videos
Viaarxiv icon

Connecting Vision and Language with Video Localized Narratives

Add code
Mar 15, 2023
Figure 1 for Connecting Vision and Language with Video Localized Narratives
Figure 2 for Connecting Vision and Language with Video Localized Narratives
Figure 3 for Connecting Vision and Language with Video Localized Narratives
Figure 4 for Connecting Vision and Language with Video Localized Narratives
Viaarxiv icon