Picture for Aoxue Li

Aoxue Li

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

Add code
Jul 08, 2024
Viaarxiv icon

Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion

Add code
May 24, 2024
Viaarxiv icon

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model

Add code
May 24, 2024
Viaarxiv icon

Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization

Add code
Mar 14, 2024
Figure 1 for Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
Figure 2 for Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
Figure 3 for Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
Figure 4 for Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization
Viaarxiv icon

Efficient Transferability Assessment for Selection of Pre-trained Detectors

Add code
Mar 14, 2024
Figure 1 for Efficient Transferability Assessment for Selection of Pre-trained Detectors
Figure 2 for Efficient Transferability Assessment for Selection of Pre-trained Detectors
Figure 3 for Efficient Transferability Assessment for Selection of Pre-trained Detectors
Figure 4 for Efficient Transferability Assessment for Selection of Pre-trained Detectors
Viaarxiv icon

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation

Add code
Jan 30, 2024
Viaarxiv icon

CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Add code
Jan 18, 2024
Viaarxiv icon

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Add code
Dec 19, 2023
Figure 1 for Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Figure 2 for Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Figure 3 for Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Figure 4 for Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Viaarxiv icon

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

Add code
Aug 15, 2023
Figure 1 for UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
Figure 2 for UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
Figure 3 for UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
Figure 4 for UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
Viaarxiv icon

ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning

Add code
Mar 30, 2023
Figure 1 for ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning
Figure 2 for ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning
Figure 3 for ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning
Figure 4 for ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning
Viaarxiv icon