Picture for Arsha Nagrani

Arsha Nagrani

AutoAD III: The Prequel -- Back to the Pixels

Add code
Apr 22, 2024
Viaarxiv icon

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Add code
Apr 09, 2024
Figure 1 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Figure 2 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Figure 3 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Figure 4 for MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Viaarxiv icon

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon

Video Summarization: Towards Entity-Aware Captions

Add code
Dec 01, 2023
Figure 1 for Video Summarization: Towards Entity-Aware Captions
Figure 2 for Video Summarization: Towards Entity-Aware Captions
Figure 3 for Video Summarization: Towards Entity-Aware Captions
Figure 4 for Video Summarization: Towards Entity-Aware Captions
Viaarxiv icon

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Add code
Oct 10, 2023
Figure 1 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 2 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 3 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 4 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Viaarxiv icon

VidChapters-7M: Video Chapters at Scale

Add code
Sep 25, 2023
Figure 1 for VidChapters-7M: Video Chapters at Scale
Figure 2 for VidChapters-7M: Video Chapters at Scale
Figure 3 for VidChapters-7M: Video Chapters at Scale
Figure 4 for VidChapters-7M: Video Chapters at Scale
Viaarxiv icon

LanSER: Language-Model Supported Speech Emotion Recognition

Add code
Sep 07, 2023
Figure 1 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 2 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 3 for LanSER: Language-Model Supported Speech Emotion Recognition
Figure 4 for LanSER: Language-Model Supported Speech Emotion Recognition
Viaarxiv icon

UnLoc: A Unified Framework for Video Localization Tasks

Add code
Aug 21, 2023
Figure 1 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 2 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 3 for UnLoc: A Unified Framework for Video Localization Tasks
Figure 4 for UnLoc: A Unified Framework for Video Localization Tasks
Viaarxiv icon

Modular Visual Question Answering via Code Generation

Add code
Jun 08, 2023
Figure 1 for Modular Visual Question Answering via Code Generation
Figure 2 for Modular Visual Question Answering via Code Generation
Figure 3 for Modular Visual Question Answering via Code Generation
Figure 4 for Modular Visual Question Answering via Code Generation
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon