Picture for Qifeng Chen

Qifeng Chen

SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality

Add code
Sep 12, 2024
Figure 1 for SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Figure 2 for SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Figure 3 for SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Figure 4 for SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Viaarxiv icon

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

Add code
Sep 04, 2024
Figure 1 for HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
Figure 2 for HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
Figure 3 for HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
Figure 4 for HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
Viaarxiv icon

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

Add code
Sep 02, 2024
Viaarxiv icon

Diffusion-Based Visual Art Creation: A Survey and New Perspectives

Add code
Aug 22, 2024
Viaarxiv icon

TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization

Add code
Aug 07, 2024
Figure 1 for TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
Figure 2 for TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
Figure 3 for TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
Figure 4 for TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
Viaarxiv icon

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Add code
Jul 30, 2024
Figure 1 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 2 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 3 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 4 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Viaarxiv icon

Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection

Add code
Jul 22, 2024
Viaarxiv icon

Gaussian-Informed Continuum for Physical Property Identification and Simulation

Add code
Jun 21, 2024
Viaarxiv icon

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Add code
Jun 06, 2024
Figure 1 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 2 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 3 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 4 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Viaarxiv icon

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

Add code
Jun 04, 2024
Figure 1 for Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Figure 2 for Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Figure 3 for Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Figure 4 for Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Viaarxiv icon