Alert button
Picture for Arsha Nagrani

Arsha Nagrani

Alert button

VicTR: Video-conditioned Text Representations for Activity Recognition

Add code
Bookmark button
Alert button
Apr 05, 2023
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo

Figure 1 for VicTR: Video-conditioned Text Representations for Activity Recognition
Figure 2 for VicTR: Video-conditioned Text Representations for Activity Recognition
Figure 3 for VicTR: Video-conditioned Text Representations for Activity Recognition
Figure 4 for VicTR: Video-conditioned Text Representations for Activity Recognition
Viaarxiv icon

AutoAD: Movie Description in Context

Add code
Bookmark button
Alert button
Mar 29, 2023
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Figure 1 for AutoAD: Movie Description in Context
Figure 2 for AutoAD: Movie Description in Context
Figure 3 for AutoAD: Movie Description in Context
Figure 4 for AutoAD: Movie Description in Context
Viaarxiv icon

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Add code
Bookmark button
Alert button
Mar 29, 2023
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Figure 1 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 2 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 3 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 4 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Bookmark button
Alert button
Mar 21, 2023
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid

Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

Add code
Bookmark button
Alert button
Mar 06, 2023
Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman

Figure 1 for VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Figure 2 for VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Figure 3 for VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Figure 4 for VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Viaarxiv icon

AVATAR submission to the Ego4D AV Transcription Challenge

Add code
Bookmark button
Alert button
Nov 18, 2022
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Figure 1 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 2 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 3 for AVATAR submission to the Ego4D AV Transcription Challenge
Figure 4 for AVATAR submission to the Ego4D AV Transcription Challenge
Viaarxiv icon

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency

Add code
Bookmark button
Alert button
Aug 14, 2022
Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid

Figure 1 for TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Figure 2 for TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Figure 3 for TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Figure 4 for TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Viaarxiv icon

M&M Mix: A Multimodal Multiview Transformer Ensemble

Add code
Bookmark button
Alert button
Jun 20, 2022
Xuehan Xiong, Anurag Arnab, Arsha Nagrani, Cordelia Schmid

Figure 1 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 2 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 3 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Figure 4 for M&M Mix: A Multimodal Multiview Transformer Ensemble
Viaarxiv icon