Alert button
Picture for Florian Metze

Florian Metze

Alert button

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

Add code
Bookmark button
Alert button
Mar 18, 2021
Po-Yao Huang, Mandela Patrick, Junjie Hu, Graham Neubig, Florian Metze, Alexander Hauptmann

Figure 1 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 2 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 3 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 4 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Viaarxiv icon

Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning

Add code
Bookmark button
Alert button
Mar 18, 2021
Mandela Patrick, Yuki M. Asano, Bernie Huang, Ishan Misra, Florian Metze, Joao Henriques, Andrea Vedaldi

Figure 1 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Figure 2 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Figure 3 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Figure 4 for Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Viaarxiv icon

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

Add code
Bookmark button
Alert button
Feb 16, 2021
Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black

Figure 1 for NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Figure 2 for NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Figure 3 for NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Figure 4 for NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Viaarxiv icon

Audio-Visual Event Recognition through the lens of Adversary

Add code
Bookmark button
Alert button
Nov 15, 2020
Juncheng B Li, Kaixin Ma, Shuhui Qu, Po-Yao Huang, Florian Metze

Figure 1 for Audio-Visual Event Recognition through the lens of Adversary
Figure 2 for Audio-Visual Event Recognition through the lens of Adversary
Figure 3 for Audio-Visual Event Recognition through the lens of Adversary
Figure 4 for Audio-Visual Event Recognition through the lens of Adversary
Viaarxiv icon

Multimodal Speech Recognition with Unstructured Audio Masking

Add code
Bookmark button
Alert button
Oct 16, 2020
Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

Figure 1 for Multimodal Speech Recognition with Unstructured Audio Masking
Figure 2 for Multimodal Speech Recognition with Unstructured Audio Masking
Figure 3 for Multimodal Speech Recognition with Unstructured Audio Masking
Figure 4 for Multimodal Speech Recognition with Unstructured Audio Masking
Viaarxiv icon

On Long-Tailed Phenomena in Neural Machine Translation

Add code
Bookmark button
Alert button
Oct 10, 2020
Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze

Figure 1 for On Long-Tailed Phenomena in Neural Machine Translation
Figure 2 for On Long-Tailed Phenomena in Neural Machine Translation
Figure 3 for On Long-Tailed Phenomena in Neural Machine Translation
Figure 4 for On Long-Tailed Phenomena in Neural Machine Translation
Viaarxiv icon

Support-set bottlenecks for video-text representation learning

Add code
Bookmark button
Alert button
Oct 06, 2020
Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi

Figure 1 for Support-set bottlenecks for video-text representation learning
Figure 2 for Support-set bottlenecks for video-text representation learning
Figure 3 for Support-set bottlenecks for video-text representation learning
Figure 4 for Support-set bottlenecks for video-text representation learning
Viaarxiv icon

Fine-Grained Grounding for Multimodal Speech Recognition

Add code
Bookmark button
Alert button
Oct 05, 2020
Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

Figure 1 for Fine-Grained Grounding for Multimodal Speech Recognition
Figure 2 for Fine-Grained Grounding for Multimodal Speech Recognition
Figure 3 for Fine-Grained Grounding for Multimodal Speech Recognition
Figure 4 for Fine-Grained Grounding for Multimodal Speech Recognition
Viaarxiv icon

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

Add code
Bookmark button
Alert button
Sep 12, 2020
Ze Cheng, Juncheng Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze

Figure 1 for Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations
Figure 2 for Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations
Figure 3 for Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations
Figure 4 for Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations
Viaarxiv icon

How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

Add code
Bookmark button
Alert button
Aug 18, 2020
Amanda Duarte, Shruti Palaskar, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, Xavier Giro-i-Nieto

Figure 1 for How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language
Figure 2 for How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language
Figure 3 for How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language
Viaarxiv icon