Picture for Cordelia Schmid

Cordelia Schmid

Thoth

Towards Zero-Shot Multimodal Machine Translation

Add code
Jul 18, 2024
Viaarxiv icon

DataDream: Few-shot Guided Dataset Generation

Add code
Jul 16, 2024
Viaarxiv icon

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Add code
Jun 13, 2024
Viaarxiv icon

Smoke and Mirrors in Causal Downstream Tasks

Add code
May 27, 2024
Viaarxiv icon

Learning text-to-video retrieval from image captioning

Add code
Apr 26, 2024
Figure 1 for Learning text-to-video retrieval from image captioning
Figure 2 for Learning text-to-video retrieval from image captioning
Figure 3 for Learning text-to-video retrieval from image captioning
Figure 4 for Learning text-to-video retrieval from image captioning
Viaarxiv icon

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

Add code
Apr 24, 2024
Figure 1 for ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Figure 2 for ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Figure 3 for ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Figure 4 for ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Viaarxiv icon

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Add code
Apr 09, 2024
Figure 1 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Figure 2 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Figure 3 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Figure 4 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Viaarxiv icon

Learning Correlation Structures for Vision Transformers

Add code
Apr 05, 2024
Figure 1 for Learning Correlation Structures for Vision Transformers
Figure 2 for Learning Correlation Structures for Vision Transformers
Figure 3 for Learning Correlation Structures for Vision Transformers
Figure 4 for Learning Correlation Structures for Vision Transformers
Viaarxiv icon

SUGAR: Pre-training 3D Visual Representations for Robotics

Add code
Apr 01, 2024
Figure 1 for SUGAR: Pre-training 3D Visual Representations for Robotics
Figure 2 for SUGAR: Pre-training 3D Visual Representations for Robotics
Figure 3 for SUGAR: Pre-training 3D Visual Representations for Robotics
Figure 4 for SUGAR: Pre-training 3D Visual Representations for Robotics
Viaarxiv icon

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon