Picture for Yuchao Gu

Yuchao Gu

Olaf-World: Orienting Latent Actions for Video World Modeling

Add code
Feb 10, 2026
Viaarxiv icon

MIND: Benchmarking Memory Consistency and Action Control in World Models

Add code
Feb 08, 2026
Viaarxiv icon

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

Add code
Apr 08, 2025
Viaarxiv icon

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Add code
Mar 25, 2025
Figure 1 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 2 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 3 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 4 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Viaarxiv icon

Edit Transfer: Learning Image Editing via Vision In-Context Relations

Add code
Mar 17, 2025
Viaarxiv icon

ROICtrl: Boosting Instance Control for Visual Generation

Add code
Nov 27, 2024
Figure 1 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 2 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 3 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 4 for ROICtrl: Boosting Instance Control for Visual Generation
Viaarxiv icon

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Add code
Oct 10, 2024
Figure 1 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 2 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 3 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 4 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Viaarxiv icon

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Add code
Aug 22, 2024
Figure 1 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 2 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 3 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 4 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Viaarxiv icon

DragAnything: Motion Control for Anything using Entity Representation

Add code
Mar 15, 2024
Viaarxiv icon

Towards A Better Metric for Text-to-Video Generation

Add code
Jan 15, 2024
Viaarxiv icon