Picture for Zhuowen Tu

Zhuowen Tu

Real Deep Research for AI, Robotics and Beyond

Add code
Oct 23, 2025
Viaarxiv icon

C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing

Add code
Oct 06, 2025
Viaarxiv icon

VideoNSA: Native Sparse Attention Scales Video Understanding

Add code
Oct 02, 2025
Viaarxiv icon

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Viaarxiv icon

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Add code
Jul 30, 2025
Viaarxiv icon

AuthGuard: Generalizable Deepfake Detection via Language Guidance

Add code
Jun 04, 2025
Viaarxiv icon

Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels

Add code
May 20, 2025
Viaarxiv icon

Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

Add code
May 07, 2025
Figure 1 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Figure 2 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Figure 3 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Figure 4 for Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Viaarxiv icon

Efficient Scaling of Diffusion Transformers for Text-to-Image Generation

Add code
Dec 16, 2024
Figure 1 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Figure 2 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Figure 3 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Figure 4 for Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Viaarxiv icon

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

Add code
Oct 04, 2024
Figure 1 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Figure 2 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Figure 3 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Figure 4 for DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Viaarxiv icon