Picture for Xudong Lin

Xudong Lin

Revitalize Region Feature for Democratizing Video-Language Pre-training

Add code
Mar 19, 2022
Figure 1 for Revitalize Region Feature for Democratizing Video-Language Pre-training
Figure 2 for Revitalize Region Feature for Democratizing Video-Language Pre-training
Figure 3 for Revitalize Region Feature for Democratizing Video-Language Pre-training
Figure 4 for Revitalize Region Feature for Democratizing Video-Language Pre-training
Viaarxiv icon

All in One: Exploring Unified Video-Language Pre-training

Add code
Mar 14, 2022
Figure 1 for All in One: Exploring Unified Video-Language Pre-training
Figure 2 for All in One: Exploring Unified Video-Language Pre-training
Figure 3 for All in One: Exploring Unified Video-Language Pre-training
Figure 4 for All in One: Exploring Unified Video-Language Pre-training
Viaarxiv icon

Learning To Recognize Procedural Activities with Distant Supervision

Add code
Jan 26, 2022
Figure 1 for Learning To Recognize Procedural Activities with Distant Supervision
Figure 2 for Learning To Recognize Procedural Activities with Distant Supervision
Figure 3 for Learning To Recognize Procedural Activities with Distant Supervision
Figure 4 for Learning To Recognize Procedural Activities with Distant Supervision
Viaarxiv icon

CLIP-Event: Connecting Text and Images with Event Structures

Add code
Jan 13, 2022
Figure 1 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 2 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 3 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 4 for CLIP-Event: Connecting Text and Images with Event Structures
Viaarxiv icon

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

Add code
Dec 20, 2021
Figure 1 for MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Figure 2 for MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Figure 3 for MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Figure 4 for MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Viaarxiv icon

Video-Text Pre-training with Learned Regions

Add code
Dec 06, 2021
Figure 1 for Video-Text Pre-training with Learned Regions
Figure 2 for Video-Text Pre-training with Learned Regions
Figure 3 for Video-Text Pre-training with Learned Regions
Figure 4 for Video-Text Pre-training with Learned Regions
Viaarxiv icon

Object-aware Video-language Pre-training for Retrieval

Add code
Dec 06, 2021
Figure 1 for Object-aware Video-language Pre-training for Retrieval
Figure 2 for Object-aware Video-language Pre-training for Retrieval
Figure 3 for Object-aware Video-language Pre-training for Retrieval
Figure 4 for Object-aware Video-language Pre-training for Retrieval
Viaarxiv icon

Joint Multimedia Event Extraction from Video and Article

Add code
Sep 27, 2021
Figure 1 for Joint Multimedia Event Extraction from Video and Article
Figure 2 for Joint Multimedia Event Extraction from Video and Article
Figure 3 for Joint Multimedia Event Extraction from Video and Article
Figure 4 for Joint Multimedia Event Extraction from Video and Article
Viaarxiv icon

Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

Add code
Mar 23, 2021
Figure 1 for Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Figure 2 for Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Figure 3 for Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Figure 4 for Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Viaarxiv icon

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Add code
Jan 29, 2021
Figure 1 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Figure 2 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Figure 3 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Figure 4 for VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Viaarxiv icon