Picture for Liunian Harold Li

Liunian Harold Li

RegionCLIP: Region-based Language-Image Pretraining

Add code
Dec 16, 2021
Figure 1 for RegionCLIP: Region-based Language-Image Pretraining
Figure 2 for RegionCLIP: Region-based Language-Image Pretraining
Figure 3 for RegionCLIP: Region-based Language-Image Pretraining
Figure 4 for RegionCLIP: Region-based Language-Image Pretraining
Viaarxiv icon

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Add code
Dec 16, 2021
Figure 1 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Figure 2 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Figure 3 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Figure 4 for SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Viaarxiv icon

Grounded Language-Image Pre-training

Add code
Dec 07, 2021
Figure 1 for Grounded Language-Image Pre-training
Figure 2 for Grounded Language-Image Pre-training
Figure 3 for Grounded Language-Image Pre-training
Figure 4 for Grounded Language-Image Pre-training
Viaarxiv icon

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Add code
Sep 14, 2021
Figure 1 for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Figure 2 for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Figure 3 for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Figure 4 for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Viaarxiv icon

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

Add code
Aug 10, 2021
Figure 1 for BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Figure 2 for BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Figure 3 for BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Figure 4 for BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Viaarxiv icon

How Much Can CLIP Benefit Vision-and-Language Tasks?

Add code
Jul 13, 2021
Figure 1 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 2 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 3 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 4 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Viaarxiv icon

Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions

Add code
Oct 24, 2020
Figure 1 for Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions
Figure 2 for Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions
Figure 3 for Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions
Figure 4 for Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions
Viaarxiv icon

VisualBERT: A Simple and Performant Baseline for Vision and Language

Add code
Aug 09, 2019
Figure 1 for VisualBERT: A Simple and Performant Baseline for Vision and Language
Figure 2 for VisualBERT: A Simple and Performant Baseline for Vision and Language
Figure 3 for VisualBERT: A Simple and Performant Baseline for Vision and Language
Figure 4 for VisualBERT: A Simple and Performant Baseline for Vision and Language
Viaarxiv icon

Efficient Contextual Representation Learning Without Softmax Layer

Add code
Feb 28, 2019
Figure 1 for Efficient Contextual Representation Learning Without Softmax Layer
Figure 2 for Efficient Contextual Representation Learning Without Softmax Layer
Figure 3 for Efficient Contextual Representation Learning Without Softmax Layer
Figure 4 for Efficient Contextual Representation Learning Without Softmax Layer
Viaarxiv icon