Picture for Aliaksandr Siarohin

Aliaksandr Siarohin

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Add code
Jul 17, 2024
Figure 1 for VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Figure 2 for VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Figure 3 for VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Figure 4 for VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Viaarxiv icon

VIMI: Grounding Video Generation through Multi-modal Instruction

Add code
Jul 08, 2024
Viaarxiv icon

Taming Data and Transformers for Audio Generation

Add code
Jun 27, 2024
Viaarxiv icon

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Add code
Jun 12, 2024
Figure 1 for Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Figure 2 for Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Figure 3 for Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Figure 4 for Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Viaarxiv icon

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Add code
Jun 11, 2024
Figure 1 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Figure 2 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Figure 3 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Figure 4 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Viaarxiv icon

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Add code
Jun 09, 2024
Viaarxiv icon

SF-V: Single Forward Video Generation Model

Add code
Jun 06, 2024
Viaarxiv icon

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Add code
Feb 29, 2024
Figure 1 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Figure 2 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Figure 3 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Figure 4 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Viaarxiv icon

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Add code
Feb 22, 2024
Figure 1 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 2 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 3 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 4 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Viaarxiv icon

SPAD : Spatially Aware Multiview Diffusers

Add code
Feb 07, 2024
Figure 1 for SPAD : Spatially Aware Multiview Diffusers
Figure 2 for SPAD : Spatially Aware Multiview Diffusers
Figure 3 for SPAD : Spatially Aware Multiview Diffusers
Figure 4 for SPAD : Spatially Aware Multiview Diffusers
Viaarxiv icon