Picture for Anurag Arnab

Anurag Arnab

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon

Time-, Memory- and Parameter-Efficient Visual Adaptation

Add code
Feb 05, 2024
Viaarxiv icon

Pixel Aligned Language Models

Dec 14, 2023
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Dec 01, 2023
Viaarxiv icon

UnLoc: A Unified Framework for Video Localization Tasks

Add code
Aug 21, 2023
Figure 1 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 2 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 3 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 4 for UnLoc: A Unified Framework for Video Localization Tasks
Viaarxiv icon

Does Visual Pretraining Help End-to-End Reasoning?

Jul 17, 2023
Figure 1 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 2 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 3 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 4 for Does Visual Pretraining Help End-to-End Reasoning?
Viaarxiv icon

How can objects help action recognition?

Add code
Jun 20, 2023
Figure 1 for How can objects help action recognition?
Figure 2 for How can objects help action recognition?
Figure 3 for How can objects help action recognition?
Figure 4 for How can objects help action recognition?
Viaarxiv icon

Dense Video Object Captioning from Disjoint Supervision

Add code
Jun 20, 2023
Figure 1 for Dense Video Object Captioning from Disjoint Supervision
Figure 2 for Dense Video Object Captioning from Disjoint Supervision
Figure 3 for Dense Video Object Captioning from Disjoint Supervision
Figure 4 for Dense Video Object Captioning from Disjoint Supervision
Viaarxiv icon

Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

Jun 07, 2023
Figure 1 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 2 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 3 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 4 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon