Picture for Bingqi Ma

Bingqi Ma

Improving Joint Audio-Video Generation with Cross-Modal Context Learning

Add code
Mar 19, 2026
Viaarxiv icon

AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization

Add code
Mar 18, 2026
Viaarxiv icon

ADT: Tuning Diffusion Models with Adversarial Supervision

Add code
Apr 15, 2025
Figure 1 for ADT: Tuning Diffusion Models with Adversarial Supervision
Figure 2 for ADT: Tuning Diffusion Models with Adversarial Supervision
Figure 3 for ADT: Tuning Diffusion Models with Adversarial Supervision
Figure 4 for ADT: Tuning Diffusion Models with Adversarial Supervision
Viaarxiv icon

High-Fidelity Diffusion Face Swapping with ID-Constrained Facial Conditioning

Add code
Mar 28, 2025
Viaarxiv icon

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Add code
Dec 15, 2024
Figure 1 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 2 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 3 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 4 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Viaarxiv icon

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Figure 1 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 2 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 3 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 4 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Viaarxiv icon

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Add code
Nov 29, 2024
Viaarxiv icon

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Add code
Jun 17, 2024
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Figure 1 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 2 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 3 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 4 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Viaarxiv icon

MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

Add code
Mar 07, 2024
Viaarxiv icon