Picture for Lewei Yao

Lewei Yao

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Add code
Apr 14, 2024
Viaarxiv icon

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Add code
Mar 07, 2024
Viaarxiv icon

PerceptionGPT: Effectively Fusing Visual Perception into LLM

Add code
Nov 11, 2023
Viaarxiv icon

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Add code
Oct 16, 2023
Viaarxiv icon

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

Add code
Jul 04, 2023
Viaarxiv icon

DetGPT: Detect What You Need via Reasoning

Add code
May 24, 2023
Viaarxiv icon

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

Add code
May 04, 2023
Viaarxiv icon

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

Add code
Apr 10, 2023
Viaarxiv icon

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Add code
Sep 20, 2022
Figure 1 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Figure 2 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Figure 3 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Figure 4 for DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Viaarxiv icon

Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework

Add code
Mar 10, 2022
Figure 1 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Figure 2 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Figure 3 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Figure 4 for Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework
Viaarxiv icon