Picture for Cordelia Schmid

Cordelia Schmid

Thoth

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Add code
Apr 06, 2023
Figure 1 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Figure 2 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Figure 3 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Figure 4 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Viaarxiv icon

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

Add code
Apr 04, 2023
Viaarxiv icon

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Add code
Mar 29, 2023
Figure 1 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 2 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 3 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 4 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Mar 21, 2023
Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation

Add code
Dec 20, 2022
Figure 1 for Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Figure 2 for Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Figure 3 for Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Figure 4 for Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Viaarxiv icon

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Add code
Dec 10, 2022
Figure 1 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 2 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 3 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 4 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Viaarxiv icon

Audiovisual Masked Autoencoders

Add code
Dec 09, 2022
Figure 1 for Audiovisual Masked Autoencoders
Figure 2 for Audiovisual Masked Autoencoders
Figure 3 for Audiovisual Masked Autoencoders
Figure 4 for Audiovisual Masked Autoencoders
Viaarxiv icon

Location-Aware Self-Supervised Transformers

Add code
Dec 05, 2022
Viaarxiv icon

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Add code
Nov 25, 2022
Figure 1 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
Figure 2 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
Figure 3 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
Figure 4 for WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction
Viaarxiv icon

AVATAR submission to the Ego4D AV Transcription Challenge

Add code
Nov 18, 2022
Figure 1 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 2 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 3 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 4 for AVATAR submission to the Ego4D AV Transcription Challenge
Viaarxiv icon