Picture for Zhuowen Tu

Zhuowen Tu

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Viaarxiv icon

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Add code
Jul 30, 2025
Viaarxiv icon

AuthGuard: Generalizable Deepfake Detection via Language Guidance

Add code
Jun 04, 2025
Viaarxiv icon

Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels

Add code
May 20, 2025
Viaarxiv icon

Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

Add code
May 07, 2025
Figure 1 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Figure 2 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Figure 3 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Figure 4 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Viaarxiv icon

Efficient Scaling of Diffusion Transformers for Text-to-Image Generation

Add code
Dec 16, 2024
Figure 1 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Figure 2 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Figure 3 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Figure 4 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Viaarxiv icon

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

Add code
Oct 04, 2024
Figure 1 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Figure 2 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Figure 3 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Figure 4 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Viaarxiv icon

Open-World Dynamic Prompt and Continual Visual Representation Learning

Add code
Sep 09, 2024
Figure 1 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 2 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 3 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Figure 4 for Open-World Dynamic Prompt and Continual Visual Representation Learning
Viaarxiv icon

Goldfish: Monolingual Language Models for 350 Languages

Add code
Aug 19, 2024
Viaarxiv icon

OmniControlNet: Dual-stage Integration for Conditional Image Generation

Add code
Jun 09, 2024
Figure 1 for OmniControlNet: Dual-stage Integration for Conditional Image Generation
Figure 2 for OmniControlNet: Dual-stage Integration for Conditional Image Generation
Figure 3 for OmniControlNet: Dual-stage Integration for Conditional Image Generation
Figure 4 for OmniControlNet: Dual-stage Integration for Conditional Image Generation
Viaarxiv icon