Picture for Yun Cao

Yun Cao

Swin DiT: Diffusion Transformer using Pseudo Shifted Windows

Add code
May 19, 2025
Viaarxiv icon

Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics

Add code
Apr 16, 2025
Viaarxiv icon

DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model

Add code
Mar 24, 2025
Viaarxiv icon

PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation

Add code
Mar 09, 2025
Viaarxiv icon

Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

Add code
Sep 12, 2024
Figure 1 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 2 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 3 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Figure 4 for VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Viaarxiv icon

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

Add code
May 28, 2024
Figure 1 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Figure 2 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Figure 3 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Figure 4 for VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Viaarxiv icon

UVL: A Unified Framework for Video Tampering Localization

Add code
Sep 28, 2023
Viaarxiv icon

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

Add code
Jul 21, 2022
Figure 1 for SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer
Figure 2 for SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer
Figure 3 for SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer
Figure 4 for SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer
Viaarxiv icon

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos

Add code
Dec 15, 2021
Figure 1 for Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
Figure 2 for Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
Figure 3 for Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
Figure 4 for Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
Viaarxiv icon