Picture for Zhuowen Tu

Zhuowen Tu

Dolfin: Diffusion Layout Transformers without Autoencoder

Add code
Oct 25, 2023
Figure 1 for Dolfin: Diffusion Layout Transformers without Autoencoder
Figure 2 for Dolfin: Diffusion Layout Transformers without Autoencoder
Figure 3 for Dolfin: Diffusion Layout Transformers without Autoencoder
Figure 4 for Dolfin: Diffusion Layout Transformers without Autoencoder
Viaarxiv icon

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

Add code
Sep 20, 2023
Figure 1 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Figure 2 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Figure 3 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Figure 4 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Viaarxiv icon

Object-Centric Multiple Object Tracking

Add code
Sep 05, 2023
Figure 1 for Object-Centric Multiple Object Tracking
Figure 2 for Object-Centric Multiple Object Tracking
Figure 3 for Object-Centric Multiple Object Tracking
Figure 4 for Object-Centric Multiple Object Tracking
Viaarxiv icon

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

Add code
Aug 29, 2023
Viaarxiv icon

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Add code
Aug 19, 2023
Figure 1 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 2 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 3 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 4 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Viaarxiv icon

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

Add code
Aug 02, 2023
Figure 1 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Figure 2 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Figure 3 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Figure 4 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Viaarxiv icon

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Add code
Jul 19, 2023
Viaarxiv icon

DocTr: Document Transformer for Structured Information Extraction in Documents

Add code
Jul 16, 2023
Viaarxiv icon

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

Add code
May 11, 2023
Viaarxiv icon

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

Add code
Apr 17, 2023
Figure 1 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
Figure 2 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
Figure 3 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
Figure 4 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
Viaarxiv icon