Picture for Yun Cao

Yun Cao

SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment

Add code
Aug 08, 2025
Viaarxiv icon

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Add code
Jul 30, 2025
Viaarxiv icon

Swin DiT: Diffusion Transformer using Pseudo Shifted Windows

Add code
May 19, 2025
Viaarxiv icon

Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics

Add code
Apr 16, 2025
Viaarxiv icon

DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model

Add code
Mar 24, 2025
Figure 1 for DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
Figure 2 for DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
Figure 3 for DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
Figure 4 for DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
Viaarxiv icon

PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation

Add code
Mar 09, 2025
Viaarxiv icon

Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

Add code
Sep 12, 2024
Figure 1 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 2 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 3 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 4 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Viaarxiv icon

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Add code
May 28, 2024
Figure 1 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Figure 2 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Figure 3 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Figure 4 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Viaarxiv icon

UVL: A Unified Framework for Video Tampering Localization

Add code
Sep 28, 2023
Figure 1 for UVL: A Unified Framework for Video Tampering Localization
Figure 2 for UVL: A Unified Framework for Video Tampering Localization
Figure 3 for UVL: A Unified Framework for Video Tampering Localization
Figure 4 for UVL: A Unified Framework for Video Tampering Localization
Viaarxiv icon