Picture for Luowei Zhou

Luowei Zhou

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

Add code
Jun 03, 2022
Figure 1 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 2 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 3 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 4 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Viaarxiv icon

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Add code
May 29, 2022
Figure 1 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 2 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 3 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 4 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Viaarxiv icon

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Add code
Apr 28, 2022
Figure 1 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 2 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 3 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 4 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Viaarxiv icon

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

Add code
Jan 15, 2022
Figure 1 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 2 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 3 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Figure 4 for CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Viaarxiv icon

CLIP-Event: Connecting Text and Images with Event Structures

Add code
Jan 13, 2022
Figure 1 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 2 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 3 for CLIP-Event: Connecting Text and Images with Event Structures
Figure 4 for CLIP-Event: Connecting Text and Images with Event Structures
Viaarxiv icon

RegionCLIP: Region-based Language-Image Pretraining

Add code
Dec 16, 2021
Figure 1 for RegionCLIP: Region-based Language-Image Pretraining
Figure 2 for RegionCLIP: Region-based Language-Image Pretraining
Figure 3 for RegionCLIP: Region-based Language-Image Pretraining
Figure 4 for RegionCLIP: Region-based Language-Image Pretraining
Viaarxiv icon

BEVT: BERT Pretraining of Video Transformers

Add code
Dec 02, 2021
Figure 1 for BEVT: BERT Pretraining of Video Transformers
Figure 2 for BEVT: BERT Pretraining of Video Transformers
Figure 3 for BEVT: BERT Pretraining of Video Transformers
Figure 4 for BEVT: BERT Pretraining of Video Transformers
Viaarxiv icon

Florence: A New Foundation Model for Computer Vision

Add code
Nov 22, 2021
Figure 1 for Florence: A New Foundation Model for Computer Vision
Figure 2 for Florence: A New Foundation Model for Computer Vision
Figure 3 for Florence: A New Foundation Model for Computer Vision
Figure 4 for Florence: A New Foundation Model for Computer Vision
Viaarxiv icon

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Add code
Jun 08, 2021
Figure 1 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 2 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 3 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 4 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Viaarxiv icon

CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning

Add code
Apr 13, 2021
Figure 1 for CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Figure 2 for CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Figure 3 for CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Figure 4 for CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Viaarxiv icon