Picture for Arsha Nagrani

Arsha Nagrani

A CLIP-Hitchhiker's Guide to Long Video Retrieval

Add code
May 17, 2022
Figure 1 for A CLIP-Hitchhiker's Guide to Long Video Retrieval
Figure 2 for A CLIP-Hitchhiker's Guide to Long Video Retrieval
Figure 3 for A CLIP-Hitchhiker's Guide to Long Video Retrieval
Figure 4 for A CLIP-Hitchhiker's Guide to Long Video Retrieval
Viaarxiv icon

Learning Audio-Video Modalities from Image Captions

Add code
Apr 01, 2022
Figure 1 for Learning Audio-Video Modalities from Image Captions
Figure 2 for Learning Audio-Video Modalities from Image Captions
Figure 3 for Learning Audio-Video Modalities from Image Captions
Figure 4 for Learning Audio-Video Modalities from Image Captions
Viaarxiv icon

End-to-end Generative Pretraining for Multimodal Video Captioning

Add code
Jan 20, 2022
Figure 1 for End-to-end Generative Pretraining for Multimodal Video Captioning
Figure 2 for End-to-end Generative Pretraining for Multimodal Video Captioning
Figure 3 for End-to-end Generative Pretraining for Multimodal Video Captioning
Figure 4 for End-to-end Generative Pretraining for Multimodal Video Captioning
Viaarxiv icon

VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge

Add code
Jan 12, 2022
Figure 1 for VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
Figure 2 for VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
Figure 3 for VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
Figure 4 for VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
Viaarxiv icon

Audio-Visual Synchronisation in the wild

Add code
Dec 08, 2021
Figure 1 for Audio-Visual Synchronisation in the wild
Figure 2 for Audio-Visual Synchronisation in the wild
Figure 3 for Audio-Visual Synchronisation in the wild
Figure 4 for Audio-Visual Synchronisation in the wild
Viaarxiv icon

Masking Modalities for Cross-modal Video Retrieval

Add code
Nov 03, 2021
Figure 1 for Masking Modalities for Cross-modal Video Retrieval
Figure 2 for Masking Modalities for Cross-modal Video Retrieval
Figure 3 for Masking Modalities for Cross-modal Video Retrieval
Figure 4 for Masking Modalities for Cross-modal Video Retrieval
Viaarxiv icon

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

Add code
Nov 01, 2021
Figure 1 for With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Figure 2 for With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Figure 3 for With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Figure 4 for With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Viaarxiv icon

Attention Bottlenecks for Multimodal Fusion

Add code
Jun 30, 2021
Figure 1 for Attention Bottlenecks for Multimodal Fusion
Figure 2 for Attention Bottlenecks for Multimodal Fusion
Figure 3 for Attention Bottlenecks for Multimodal Fusion
Figure 4 for Attention Bottlenecks for Multimodal Fusion
Viaarxiv icon

Localizing Visual Sounds the Hard Way

Add code
Apr 06, 2021
Figure 1 for Localizing Visual Sounds the Hard Way
Figure 2 for Localizing Visual Sounds the Hard Way
Figure 3 for Localizing Visual Sounds the Hard Way
Figure 4 for Localizing Visual Sounds the Hard Way
Viaarxiv icon

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Add code
Apr 01, 2021
Figure 1 for Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Figure 2 for Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Figure 3 for Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Figure 4 for Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Viaarxiv icon