Picture for Weijia Mao

Weijia Mao

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Add code
May 13, 2026
Viaarxiv icon

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Add code
Feb 15, 2026
Viaarxiv icon

UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

Add code
Feb 15, 2026
Viaarxiv icon

Mitty: Diffusion-based Human-to-Robot Video Generation

Add code
Dec 19, 2025
Viaarxiv icon

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Add code
May 29, 2025
Viaarxiv icon

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Add code
Mar 25, 2025
Figure 1 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 2 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 3 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Figure 4 for Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Viaarxiv icon

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Add code
Mar 05, 2025
Figure 1 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Figure 2 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Figure 3 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Figure 4 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Viaarxiv icon

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths

Add code
Feb 10, 2025
Figure 1 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 2 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 3 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Figure 4 for UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Viaarxiv icon

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Add code
Aug 22, 2024
Figure 1 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 2 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 3 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 4 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Viaarxiv icon

ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors

Add code
Dec 20, 2023
Viaarxiv icon