Alert button
Picture for Arsha Nagrani

Arsha Nagrani

Alert button

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Add code
Bookmark button
Alert button
Apr 09, 2024
Juhong Min, Shyamal Buch, Arsha Nagrani, Minsu Cho, Cordelia Schmid

Viaarxiv icon

Streaming Dense Video Captioning

Add code
Bookmark button
Alert button
Apr 01, 2024
Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid

Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Bookmark button
Alert button
Dec 01, 2023
Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang

Viaarxiv icon

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Add code
Bookmark button
Alert button
Oct 10, 2023
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Figure 1 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 2 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 3 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 4 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Viaarxiv icon

VidChapters-7M: Video Chapters at Scale

Add code
Bookmark button
Alert button
Sep 25, 2023
Antoine Yang, Arsha Nagrani, Ivan Laptev, Josef Sivic, Cordelia Schmid

Figure 1 for VidChapters-7M: Video Chapters at Scale
Figure 2 for VidChapters-7M: Video Chapters at Scale
Figure 3 for VidChapters-7M: Video Chapters at Scale
Figure 4 for VidChapters-7M: Video Chapters at Scale
Viaarxiv icon

LanSER: Language-Model Supported Speech Emotion Recognition

Add code
Bookmark button
Alert button
Sep 07, 2023
Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou

Figure 1 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 2 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 3 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 4 for LanSER: Language-Model Supported Speech Emotion Recognition
Viaarxiv icon

UnLoc: A Unified Framework for Video Localization Tasks

Add code
Bookmark button
Alert button
Aug 21, 2023
Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid

Figure 1 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 2 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 3 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 4 for UnLoc: A Unified Framework for Video Localization Tasks
Viaarxiv icon

Modular Visual Question Answering via Code Generation

Add code
Bookmark button
Alert button
Jun 08, 2023
Sanjay Subramanian, Medhini Narasimhan, Kushal Khangaonkar, Kevin Yang, Arsha Nagrani, Cordelia Schmid, Andy Zeng, Trevor Darrell, Dan Klein

Figure 1 for Modular Visual Question Answering via Code Generation
Figure 2 for Modular Visual Question Answering via Code Generation
Figure 3 for Modular Visual Question Answering via Code Generation
Figure 4 for Modular Visual Question Answering via Code Generation
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
Bookmark button
Alert button
May 29, 2023
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Verbs in Action: Improving verb understanding in video-language models

Add code
Bookmark button
Alert button
Apr 13, 2023
Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid

Figure 1 for Verbs in Action: Improving verb understanding in video-language models
Figure 2 for Verbs in Action: Improving verb understanding in video-language models
Figure 3 for Verbs in Action: Improving verb understanding in video-language models
Figure 4 for Verbs in Action: Improving verb understanding in video-language models
Viaarxiv icon