Picture for Le Zhuo

Le Zhuo

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Add code
Apr 22, 2025
Viaarxiv icon

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Add code
Apr 10, 2025
Viaarxiv icon

OmniCaptioner: One Captioner to Rule Them All

Add code
Apr 09, 2025
Viaarxiv icon

Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision

Add code
Apr 08, 2025
Viaarxiv icon

Vision-to-Music Generation: A Survey

Add code
Mar 27, 2025
Viaarxiv icon

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Add code
Mar 27, 2025
Viaarxiv icon

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Add code
Mar 10, 2025
Viaarxiv icon

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Add code
Jan 23, 2025
Figure 1 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Figure 2 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Figure 3 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Figure 4 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Viaarxiv icon

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Add code
Dec 12, 2024
Viaarxiv icon

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Add code
Nov 22, 2024
Viaarxiv icon