Picture for Licheng Yu

Licheng Yu

Sid

Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace

Add code
Feb 21, 2023
Viaarxiv icon

CiT: Curation in Training for Effective Vision-Language Data

Add code
Jan 05, 2023
Figure 1 for CiT: Curation in Training for Effective Vision-Language Data
Figure 2 for CiT: Curation in Training for Effective Vision-Language Data
Figure 3 for CiT: Curation in Training for Effective Vision-Language Data
Figure 4 for CiT: Curation in Training for Effective Vision-Language Data
Viaarxiv icon

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

Add code
Nov 23, 2022
Viaarxiv icon

FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning

Add code
Oct 26, 2022
Figure 1 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Figure 2 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Figure 3 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Figure 4 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Viaarxiv icon

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

Add code
Jul 17, 2022
Figure 1 for FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Figure 2 for FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Figure 3 for FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Figure 4 for FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Viaarxiv icon

GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval

Add code
Apr 10, 2022
Figure 1 for GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval
Figure 2 for GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval
Figure 3 for GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval
Figure 4 for GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval
Viaarxiv icon

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

Add code
Mar 10, 2022
Figure 1 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Figure 2 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Figure 3 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Figure 4 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Viaarxiv icon

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Add code
Mar 01, 2022
Figure 1 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Figure 2 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Figure 3 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Figure 4 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Viaarxiv icon

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

Add code
Feb 15, 2022
Figure 1 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Figure 2 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Figure 3 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Figure 4 for CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Viaarxiv icon

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Add code
Jun 08, 2021
Figure 1 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 2 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 3 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Figure 4 for VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Viaarxiv icon