Picture for Zhenheng Yang

Zhenheng Yang

Growing Visual Generative Capacity for Pre-Trained MLLMs

Add code
Oct 02, 2025
Viaarxiv icon

Mixture of Contexts for Long Video Generation

Add code
Aug 28, 2025
Viaarxiv icon

UniAPO: Unified Multimodal Automated Prompt Optimization

Add code
Aug 25, 2025
Figure 1 for UniAPO: Unified Multimodal Automated Prompt Optimization
Figure 2 for UniAPO: Unified Multimodal Automated Prompt Optimization
Figure 3 for UniAPO: Unified Multimodal Automated Prompt Optimization
Figure 4 for UniAPO: Unified Multimodal Automated Prompt Optimization
Viaarxiv icon

Show-o2: Improved Native Unified Multimodal Models

Add code
Jun 18, 2025
Figure 1 for Show-o2: Improved Native Unified Multimodal Models
Figure 2 for Show-o2: Improved Native Unified Multimodal Models
Figure 3 for Show-o2: Improved Native Unified Multimodal Models
Figure 4 for Show-o2: Improved Native Unified Multimodal Models
Viaarxiv icon

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Add code
May 29, 2025
Viaarxiv icon

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Add code
May 16, 2025
Viaarxiv icon

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Add code
Apr 11, 2025
Viaarxiv icon

Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning

Add code
Mar 17, 2025
Viaarxiv icon

Long Context Tuning for Video Generation

Add code
Mar 13, 2025
Viaarxiv icon

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths

Add code
Feb 10, 2025
Figure 1 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 2 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 3 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 4 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Viaarxiv icon