Picture for Xiaoshi Wu

Xiaoshi Wu

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Add code
May 01, 2024
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Viaarxiv icon

ECNet: Effective Controllable Text-to-Image Diffusion Models

Add code
Mar 27, 2024
Viaarxiv icon

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Add code
Mar 20, 2024
Figure 1 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 2 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 3 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 4 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Viaarxiv icon

JourneyDB: A Benchmark for Generative Image Understanding

Add code
Jul 03, 2023
Viaarxiv icon

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Add code
Jun 15, 2023
Viaarxiv icon

Better Aligning Text-to-Image Models with Human Preference

Add code
Mar 25, 2023
Viaarxiv icon

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Add code
Mar 23, 2023
Viaarxiv icon

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Add code
Dec 02, 2021
Figure 1 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 2 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 3 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Figure 4 for Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Viaarxiv icon

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

Add code
Aug 12, 2021
Figure 1 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Figure 2 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Figure 3 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Figure 4 for Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Viaarxiv icon