Picture for Mengping Yang

Mengping Yang

Diff-Aid: Inference-time Adaptive Interaction Denoising for Rectified Text-to-Image Generation

Add code
Feb 14, 2026
Viaarxiv icon

Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing

Add code
Feb 09, 2026
Viaarxiv icon

Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion

Add code
Jan 05, 2026
Viaarxiv icon

A unified multimodal understanding and generation model for cross-disciplinary scientific research

Add code
Jan 04, 2026
Viaarxiv icon

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Add code
Aug 07, 2025
Figure 1 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 2 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 3 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 4 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Viaarxiv icon

Omni-Video: Democratizing Unified Video Understanding and Generation

Add code
Jul 09, 2025
Viaarxiv icon

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

Add code
Mar 12, 2025
Viaarxiv icon

Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data

Add code
Feb 02, 2025
Viaarxiv icon

E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Add code
Dec 30, 2024
Figure 1 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 2 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 3 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 4 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Viaarxiv icon

EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

Add code
Jun 27, 2024
Figure 1 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Figure 2 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Figure 3 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Figure 4 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Viaarxiv icon