Picture for Lu Yuan

Lu Yuan

Stephen

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

Add code
Sep 15, 2022
Figure 1 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Figure 2 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Figure 3 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Figure 4 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Viaarxiv icon

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Add code
Aug 29, 2022
Figure 1 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 2 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 3 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 4 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Viaarxiv icon

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Add code
Aug 25, 2022
Figure 1 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Figure 2 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Figure 3 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Figure 4 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Viaarxiv icon

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

Add code
Aug 25, 2022
Figure 1 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 2 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 3 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 4 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Viaarxiv icon

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Add code
Jul 26, 2022
Figure 1 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Figure 2 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Figure 3 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Figure 4 for Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Viaarxiv icon

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Add code
Jul 21, 2022
Figure 1 for TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Figure 2 for TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Figure 3 for TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Figure 4 for TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Viaarxiv icon

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

Add code
Jul 14, 2022
Figure 1 for Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Figure 2 for Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Figure 3 for Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Figure 4 for Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Viaarxiv icon

Should All Proposals be Treated Equally in Object Detection?

Add code
Jul 07, 2022
Figure 1 for Should All Proposals be Treated Equally in Object Detection?
Figure 2 for Should All Proposals be Treated Equally in Object Detection?
Figure 3 for Should All Proposals be Treated Equally in Object Detection?
Figure 4 for Should All Proposals be Treated Equally in Object Detection?
Viaarxiv icon

Semantic Image Synthesis via Diffusion Models

Add code
Jun 30, 2022
Figure 1 for Semantic Image Synthesis via Diffusion Models
Figure 2 for Semantic Image Synthesis via Diffusion Models
Figure 3 for Semantic Image Synthesis via Diffusion Models
Figure 4 for Semantic Image Synthesis via Diffusion Models
Viaarxiv icon

GLIPv2: Unifying Localization and Vision-Language Understanding

Add code
Jun 12, 2022
Figure 1 for GLIPv2: Unifying Localization and Vision-Language Understanding
Figure 2 for GLIPv2: Unifying Localization and Vision-Language Understanding
Figure 3 for GLIPv2: Unifying Localization and Vision-Language Understanding
Figure 4 for GLIPv2: Unifying Localization and Vision-Language Understanding
Viaarxiv icon