Picture for Gedas Bertasius

Gedas Bertasius

Improving video retrieval using multilingual knowledge transfer

Add code
Aug 28, 2022
Figure 1 for Improving video retrieval using multilingual knowledge transfer
Figure 2 for Improving video retrieval using multilingual knowledge transfer
Figure 3 for Improving video retrieval using multilingual knowledge transfer
Figure 4 for Improving video retrieval using multilingual knowledge transfer
Viaarxiv icon

Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

Add code
Jul 24, 2022
Figure 1 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Figure 2 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Figure 3 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Figure 4 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Viaarxiv icon

Learning to Retrieve Videos by Asking Questions

Add code
May 13, 2022
Figure 1 for Learning to Retrieve Videos by Asking Questions
Figure 2 for Learning to Retrieve Videos by Asking Questions
Figure 3 for Learning to Retrieve Videos by Asking Questions
Figure 4 for Learning to Retrieve Videos by Asking Questions
Viaarxiv icon

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

Add code
Apr 06, 2022
Figure 1 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Figure 2 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Figure 3 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Figure 4 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Viaarxiv icon

Long Movie Clip Classification with State-Space Video Models

Add code
Apr 04, 2022
Figure 1 for Long Movie Clip Classification with State-Space Video Models
Figure 2 for Long Movie Clip Classification with State-Space Video Models
Figure 3 for Long Movie Clip Classification with State-Space Video Models
Figure 4 for Long Movie Clip Classification with State-Space Video Models
Viaarxiv icon

TALLFormer: Temporal Action Localization with Long-memory Transformer

Add code
Apr 04, 2022
Figure 1 for TALLFormer: Temporal Action Localization with Long-memory Transformer
Figure 2 for TALLFormer: Temporal Action Localization with Long-memory Transformer
Figure 3 for TALLFormer: Temporal Action Localization with Long-memory Transformer
Figure 4 for TALLFormer: Temporal Action Localization with Long-memory Transformer
Viaarxiv icon

Learning To Recognize Procedural Activities with Distant Supervision

Add code
Jan 26, 2022
Figure 1 for Learning To Recognize Procedural Activities with Distant Supervision
Figure 2 for Learning To Recognize Procedural Activities with Distant Supervision
Figure 3 for Learning To Recognize Procedural Activities with Distant Supervision
Figure 4 for Learning To Recognize Procedural Activities with Distant Supervision
Viaarxiv icon

Long-Short Temporal Contrastive Learning of Video Transformers

Add code
Jul 08, 2021
Figure 1 for Long-Short Temporal Contrastive Learning of Video Transformers
Figure 2 for Long-Short Temporal Contrastive Learning of Video Transformers
Figure 3 for Long-Short Temporal Contrastive Learning of Video Transformers
Figure 4 for Long-Short Temporal Contrastive Learning of Video Transformers
Viaarxiv icon

Is Space-Time Attention All You Need for Video Understanding?

Add code
Feb 24, 2021
Figure 1 for Is Space-Time Attention All You Need for Video Understanding?
Figure 2 for Is Space-Time Attention All You Need for Video Understanding?
Figure 3 for Is Space-Time Attention All You Need for Video Understanding?
Figure 4 for Is Space-Time Attention All You Need for Video Understanding?
Viaarxiv icon

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Add code
Jan 29, 2021
Figure 1 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Figure 2 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Figure 3 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Figure 4 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Viaarxiv icon