Picture for Xiaoming Wei

Xiaoming Wei

Meituan

LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation

Add code
Apr 15, 2025
Viaarxiv icon

Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes

Add code
Apr 14, 2025
Viaarxiv icon

Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models

Add code
Jan 28, 2025
Viaarxiv icon

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder

Add code
Dec 23, 2024
Viaarxiv icon

High-Resolution Image Synthesis via Next-Token Prediction

Add code
Nov 22, 2024
Viaarxiv icon

Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning

Add code
Nov 19, 2024
Figure 1 for Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning
Figure 2 for Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning
Figure 3 for Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning
Figure 4 for Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning
Viaarxiv icon

Denoising with a Joint-Embedding Predictive Architecture

Add code
Oct 02, 2024
Figure 1 for Denoising with a Joint-Embedding Predictive Architecture
Figure 2 for Denoising with a Joint-Embedding Predictive Architecture
Figure 3 for Denoising with a Joint-Embedding Predictive Architecture
Figure 4 for Denoising with a Joint-Embedding Predictive Architecture
Viaarxiv icon

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

Add code
Sep 26, 2024
Viaarxiv icon

Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding

Add code
Sep 12, 2024
Figure 1 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 2 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 3 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 4 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Viaarxiv icon