Picture for Santiago Castro

Santiago Castro

CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

Add code
Mar 01, 2024
Viaarxiv icon

Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

Add code
Sep 22, 2023
Figure 1 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
Figure 2 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
Figure 3 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
Figure 4 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
Viaarxiv icon

Scalable Performance Analysis for Vision-Language Models

Add code
May 31, 2023
Figure 1 for Scalable Performance Analysis for Vision-Language Models
Figure 2 for Scalable Performance Analysis for Vision-Language Models
Figure 3 for Scalable Performance Analysis for Vision-Language Models
Figure 4 for Scalable Performance Analysis for Vision-Language Models
Viaarxiv icon

A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models

May 21, 2023
Viaarxiv icon

Phenaki: Variable Length Video Generation From Open Domain Textual Description

Add code
Oct 05, 2022
Figure 1 for Phenaki: Variable Length Video Generation From Open Domain Textual Description
Figure 2 for Phenaki: Variable Length Video Generation From Open Domain Textual Description
Figure 3 for Phenaki: Variable Length Video Generation From Open Domain Textual Description
Figure 4 for Phenaki: Variable Length Video Generation From Open Domain Textual Description
Viaarxiv icon

WildQA: In-the-Wild Video Question Answering

Sep 14, 2022
Figure 1 for WildQA: In-the-Wild Video Question Answering
Figure 2 for WildQA: In-the-Wild Video Question Answering
Figure 3 for WildQA: In-the-Wild Video Question Answering
Figure 4 for WildQA: In-the-Wild Video Question Answering
Viaarxiv icon

FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

Add code
Mar 24, 2022
Figure 1 for FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks
Figure 2 for FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks
Figure 3 for FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks
Figure 4 for FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks
Viaarxiv icon

When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs

Add code
Feb 21, 2022
Figure 1 for When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Figure 2 for When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Figure 3 for When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Figure 4 for When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Viaarxiv icon

WhyAct: Identifying Action Reasons in Lifestyle Vlogs

Add code
Sep 09, 2021
Figure 1 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs
Figure 2 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs
Figure 3 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs
Figure 4 for WhyAct: Identifying Action Reasons in Lifestyle Vlogs
Viaarxiv icon

Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework

Add code
Apr 09, 2021
Figure 1 for Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework
Figure 2 for Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework
Figure 3 for Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework
Figure 4 for Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework
Viaarxiv icon