Alert button
Picture for Anurag Arnab

Anurag Arnab

Alert button

Time-, Memory- and Parameter-Efficient Visual Adaptation

Feb 05, 2024
Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab

Viaarxiv icon

Pixel Aligned Language Models

Dec 14, 2023
Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid

Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Dec 01, 2023
Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang

Viaarxiv icon

UnLoc: A Unified Framework for Video Localization Tasks

Aug 21, 2023
Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid

Figure 1 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 2 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 3 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 4 for UnLoc: A Unified Framework for Video Localization Tasks
Viaarxiv icon

Does Visual Pretraining Help End-to-End Reasoning?

Jul 17, 2023
Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, Cordelia Schmid

Figure 1 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 2 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 3 for Does Visual Pretraining Help End-to-End Reasoning?
Figure 4 for Does Visual Pretraining Help End-to-End Reasoning?
Viaarxiv icon

Dense Video Object Captioning from Disjoint Supervision

Jun 20, 2023
Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid

Figure 1 for Dense Video Object Captioning from Disjoint Supervision
Figure 2 for Dense Video Object Captioning from Disjoint Supervision
Figure 3 for Dense Video Object Captioning from Disjoint Supervision
Figure 4 for Dense Video Object Captioning from Disjoint Supervision
Viaarxiv icon

How can objects help action recognition?

Jun 20, 2023
Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid

Figure 1 for How can objects help action recognition?
Figure 2 for How can objects help action recognition?
Figure 3 for How can objects help action recognition?
Figure 4 for How can objects help action recognition?
Viaarxiv icon

Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

Jun 07, 2023
Shreyank N Gowda, Anurag Arnab, Jonathan Huang

Figure 1 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 2 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 3 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 4 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

May 29, 2023
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Apr 24, 2023
Alexey Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lučić, Cordelia Schmid, Anurag Arnab

Figure 1 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 2 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 3 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 4 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Viaarxiv icon