Picture for Anurag Arnab

Anurag Arnab

Pixel Aligned Language Models

Add code
Dec 14, 2023
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Figure 1 for Video Summarization: Towards Entity-Aware Captions
Figure 2 for Video Summarization: Towards Entity-Aware Captions
Figure 3 for Video Summarization: Towards Entity-Aware Captions
Figure 4 for Video Summarization: Towards Entity-Aware Captions
Viaarxiv icon

UnLoc: A Unified Framework for Video Localization Tasks

Add code
Aug 21, 2023
Figure 1 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 2 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 3 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 4 for UnLoc: A Unified Framework for Video Localization Tasks
Viaarxiv icon

Does Visual Pretraining Help End-to-End Reasoning?

Add code
Jul 17, 2023
Figure 1 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 2 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 3 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 4 for Does Visual Pretraining Help End-to-End Reasoning?
Viaarxiv icon

Dense Video Object Captioning from Disjoint Supervision

Add code
Jun 20, 2023
Viaarxiv icon

How can objects help action recognition?

Add code
Jun 20, 2023
Figure 1 for How can objects help action recognition?
Figure 2 for How can objects help action recognition?
Figure 3 for How can objects help action recognition?
Figure 4 for How can objects help action recognition?
Viaarxiv icon

Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

Add code
Jun 07, 2023
Figure 1 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 2 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 3 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 4 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Add code
Apr 24, 2023
Figure 1 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 2 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 3 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 4 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Viaarxiv icon

VicTR: Video-conditioned Text Representations for Activity Recognition

Add code
Apr 05, 2023
Viaarxiv icon