Picture for Zhuofan Zong

Zhuofan Zong

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Add code
Jun 17, 2024
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Figure 1 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 2 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 3 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 4 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Viaarxiv icon

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Add code
Mar 25, 2024
Figure 1 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 2 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 3 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 4 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Viaarxiv icon

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

Add code
May 29, 2023
Figure 1 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 2 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 3 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 4 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Viaarxiv icon

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

Add code
Apr 03, 2023
Figure 1 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Figure 2 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Figure 3 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Figure 4 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Viaarxiv icon

DETRs with Collaborative Hybrid Assignments Training

Add code
Nov 22, 2022
Figure 1 for DETRs with Collaborative Hybrid Assignments Training
Figure 2 for DETRs with Collaborative Hybrid Assignments Training
Figure 3 for DETRs with Collaborative Hybrid Assignments Training
Figure 4 for DETRs with Collaborative Hybrid Assignments Training
Viaarxiv icon

Self-slimmed Vision Transformer

Add code
Nov 24, 2021
Figure 1 for Self-slimmed Vision Transformer
Figure 2 for Self-slimmed Vision Transformer
Figure 3 for Self-slimmed Vision Transformer
Figure 4 for Self-slimmed Vision Transformer
Viaarxiv icon

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

Add code
Oct 23, 2021
Figure 1 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Figure 2 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Figure 3 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Figure 4 for RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection
Viaarxiv icon