Alert button
Picture for Lu Yuan

Lu Yuan

Alert button

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

Dec 12, 2022
Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Shuyang Gu, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Figure 1 for CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Figure 2 for CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Figure 3 for CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Figure 4 for CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Viaarxiv icon

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

Dec 08, 2022
Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Yu-Gang Jiang

Figure 1 for Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Figure 2 for Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Figure 3 for Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Figure 4 for Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Viaarxiv icon

X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion

Dec 07, 2022
Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

Figure 1 for X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion
Figure 2 for X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion
Figure 3 for X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion
Figure 4 for X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion
Viaarxiv icon

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Nov 29, 2022
Shuquan Ye, Yujia Xie, Dongdong Chen, Yichong Xu, Lu Yuan, Chenguang Zhu, Jing Liao

Figure 1 for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Figure 2 for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Figure 3 for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Figure 4 for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Viaarxiv icon

Self-Supervised Learning based on Heat Equation

Nov 23, 2022
Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

Figure 1 for Self-Supervised Learning based on Heat Equation
Figure 2 for Self-Supervised Learning based on Heat Equation
Figure 3 for Self-Supervised Learning based on Heat Equation
Figure 4 for Self-Supervised Learning based on Heat Equation
Viaarxiv icon

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

Nov 22, 2022
Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Figure 1 for SinDiffusion: Learning a Diffusion Model from a Single Natural Image
Figure 2 for SinDiffusion: Learning a Diffusion Model from a Single Natural Image
Figure 3 for SinDiffusion: Learning a Diffusion Model from a Single Natural Image
Figure 4 for SinDiffusion: Learning a Diffusion Model from a Single Natural Image
Viaarxiv icon

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

Sep 15, 2022
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan

Figure 1 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Figure 2 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Figure 3 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Figure 4 for OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Viaarxiv icon

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Aug 29, 2022
Wan-Cyuan Fan, Yen-Chun Chen, DongDong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang

Figure 1 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 2 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 3 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 4 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Viaarxiv icon

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

Aug 25, 2022
Xiaoyi Dong, Yinglin Zheng, Jianmin Bao, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Figure 1 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Figure 2 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Figure 3 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Figure 4 for MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Viaarxiv icon

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

Aug 25, 2022
Rui Wang, Zuxuan Wu, Dongdong Chen, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Luowei Zhou, Lu Yuan, Yu-Gang Jiang

Figure 1 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 2 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 3 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 4 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Viaarxiv icon