Picture for Biao Gong

Biao Gong

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Add code
Jan 16, 2026
Viaarxiv icon

CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

Add code
Jan 16, 2026
Viaarxiv icon

VideoMAR: Autoregressive Video Generatio with Continuous Tokens

Add code
Jun 18, 2025
Viaarxiv icon

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Add code
Jun 11, 2025
Figure 1 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 2 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 3 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 4 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Viaarxiv icon

Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion

Add code
Jun 08, 2025
Viaarxiv icon

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Add code
May 29, 2025
Viaarxiv icon

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Add code
May 05, 2025
Figure 1 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 2 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 3 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 4 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Viaarxiv icon

DreamRelation: Relation-Centric Video Customization

Add code
Mar 10, 2025
Viaarxiv icon

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Add code
Dec 12, 2024
Figure 1 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Figure 2 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Figure 3 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Figure 4 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Viaarxiv icon

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Add code
Dec 08, 2024
Figure 1 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Figure 2 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Figure 3 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Figure 4 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Viaarxiv icon