Picture for Xudong Lin

Xudong Lin

Non-Sequential Graph Script Induction via Multimedia Grounding

Add code
May 27, 2023
Viaarxiv icon

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Add code
Apr 07, 2023
Figure 1 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 2 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 3 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Figure 4 for Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Viaarxiv icon

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Add code
Mar 29, 2023
Figure 1 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 2 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 3 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Figure 4 for Supervised Masked Knowledge Distillation for Few-Shot Transformers
Viaarxiv icon

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction

Add code
Jan 06, 2023
Viaarxiv icon

TempCLR: Temporal Alignment Representation with Contrastive Learning

Add code
Dec 28, 2022
Viaarxiv icon

Video Event Extraction via Tracking Visual States of Arguments

Add code
Nov 05, 2022
Viaarxiv icon

Learning to Decompose Visual Features with Latent Textual Prompts

Add code
Oct 09, 2022
Figure 1 for Learning to Decompose Visual Features with Latent Textual Prompts
Figure 2 for Learning to Decompose Visual Features with Latent Textual Prompts
Figure 3 for Learning to Decompose Visual Features with Latent Textual Prompts
Figure 4 for Learning to Decompose Visual Features with Latent Textual Prompts
Viaarxiv icon

Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World

Add code
Jun 14, 2022
Figure 1 for Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World
Figure 2 for Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World
Figure 3 for Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World
Figure 4 for Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World
Viaarxiv icon

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

Add code
Jun 05, 2022
Figure 1 for Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Figure 2 for Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Figure 3 for Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Figure 4 for Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Viaarxiv icon

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Add code
May 29, 2022
Figure 1 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 2 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 3 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 4 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Viaarxiv icon