Picture for Florian Metze

Florian Metze

On Adversarial Robustness of Large-scale Audio Visual Learning

Add code
Mar 23, 2022
Figure 1 for On Adversarial Robustness of Large-scale Audio Visual Learning
Figure 2 for On Adversarial Robustness of Large-scale Audio Visual Learning
Figure 3 for On Adversarial Robustness of Large-scale Audio Visual Learning
Figure 4 for On Adversarial Robustness of Large-scale Audio Visual Learning
Viaarxiv icon

Speech Summarization using Restricted Self-Attention

Add code
Oct 12, 2021
Figure 1 for Speech Summarization using Restricted Self-Attention
Figure 2 for Speech Summarization using Restricted Self-Attention
Figure 3 for Speech Summarization using Restricted Self-Attention
Figure 4 for Speech Summarization using Restricted Self-Attention
Viaarxiv icon

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

Add code
Oct 01, 2021
Figure 1 for VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Figure 2 for VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Figure 3 for VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Figure 4 for VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Viaarxiv icon

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Add code
Jul 24, 2021
Figure 1 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Figure 2 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Figure 3 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Figure 4 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Viaarxiv icon

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

Add code
Jun 29, 2021
Figure 1 for Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
Figure 2 for Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
Figure 3 for Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
Figure 4 for Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
Viaarxiv icon

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding

Add code
May 20, 2021
Figure 1 for VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Figure 2 for VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Figure 3 for VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Figure 4 for VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Viaarxiv icon

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

Add code
May 02, 2021
Figure 1 for Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Figure 2 for Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Figure 3 for Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Figure 4 for Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Viaarxiv icon

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

Add code
Apr 15, 2021
Figure 1 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 2 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 3 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 4 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Viaarxiv icon

Self-supervised object detection from audio-visual correspondence

Add code
Apr 13, 2021
Figure 1 for Self-supervised object detection from audio-visual correspondence
Figure 2 for Self-supervised object detection from audio-visual correspondence
Figure 3 for Self-supervised object detection from audio-visual correspondence
Figure 4 for Self-supervised object detection from audio-visual correspondence
Viaarxiv icon

Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning

Add code
Mar 18, 2021
Figure 1 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Figure 2 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Figure 3 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Figure 4 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Viaarxiv icon