Picture for Xiaoming Wei

Xiaoming Wei

Meituan

Forge-and-Quench: Enhancing Image Generation for Higher Fidelity in Unified Multimodal Models

Add code
Jan 08, 2026
Viaarxiv icon

Active Intelligence in Video Avatars via Closed-loop World Modeling

Add code
Dec 23, 2025
Viaarxiv icon

UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models

Add code
Dec 12, 2025
Figure 1 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 2 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 3 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 4 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Viaarxiv icon

LongCat-Image Technical Report

Add code
Dec 08, 2025
Figure 1 for LongCat-Image Technical Report
Figure 2 for LongCat-Image Technical Report
Figure 3 for LongCat-Image Technical Report
Figure 4 for LongCat-Image Technical Report
Viaarxiv icon

InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing

Add code
Aug 19, 2025
Figure 1 for InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Figure 2 for InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Figure 3 for InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Figure 4 for InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Viaarxiv icon

DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution

Add code
Jul 01, 2025
Figure 1 for DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution
Figure 2 for DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution
Figure 3 for DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution
Figure 4 for DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution
Viaarxiv icon

PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Add code
Jun 12, 2025
Figure 1 for PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Figure 2 for PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Figure 3 for PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Figure 4 for PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Viaarxiv icon

LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Add code
Jun 06, 2025
Viaarxiv icon

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Add code
May 28, 2025
Figure 1 for Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
Figure 2 for Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
Figure 3 for Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
Figure 4 for Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
Viaarxiv icon

LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation

Add code
Apr 15, 2025
Figure 1 for LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
Figure 2 for LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
Figure 3 for LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
Figure 4 for LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
Viaarxiv icon