Picture for Xiaoshi Wu

Xiaoshi Wu

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Add code
May 01, 2024
Figure 1 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Figure 2 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Figure 3 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Figure 4 for Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Figure 1 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 2 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 3 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 4 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Viaarxiv icon

ECNet: Effective Controllable Text-to-Image Diffusion Models

Add code
Mar 27, 2024
Figure 1 for ECNet: Effective Controllable Text-to-Image Diffusion Models
Figure 2 for ECNet: Effective Controllable Text-to-Image Diffusion Models
Viaarxiv icon

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Add code
Mar 20, 2024
Figure 1 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 2 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 3 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 4 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Viaarxiv icon

JourneyDB: A Benchmark for Generative Image Understanding

Add code
Jul 03, 2023
Figure 1 for JourneyDB: A Benchmark for Generative Image Understanding
Figure 2 for JourneyDB: A Benchmark for Generative Image Understanding
Figure 3 for JourneyDB: A Benchmark for Generative Image Understanding
Figure 4 for JourneyDB: A Benchmark for Generative Image Understanding
Viaarxiv icon

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Add code
Jun 15, 2023
Figure 1 for Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Figure 2 for Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Figure 3 for Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Figure 4 for Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Viaarxiv icon

Better Aligning Text-to-Image Models with Human Preference

Add code
Mar 25, 2023
Figure 1 for Better Aligning Text-to-Image Models with Human Preference
Figure 2 for Better Aligning Text-to-Image Models with Human Preference
Figure 3 for Better Aligning Text-to-Image Models with Human Preference
Figure 4 for Better Aligning Text-to-Image Models with Human Preference
Viaarxiv icon

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Add code
Mar 23, 2023
Figure 1 for CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
Figure 2 for CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
Figure 3 for CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
Figure 4 for CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
Viaarxiv icon

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Add code
Dec 02, 2021
Figure 1 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 2 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 3 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 4 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Viaarxiv icon

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

Add code
Aug 12, 2021
Figure 1 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Figure 2 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Figure 3 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Figure 4 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Viaarxiv icon