Picture for Shentong Mo

Shentong Mo

Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation

Add code
Mar 29, 2026
Viaarxiv icon

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

Add code
Mar 09, 2026
Viaarxiv icon

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Add code
Feb 26, 2026
Viaarxiv icon

GMAIL: Generative Modality Alignment for generated Image Learning

Add code
Feb 17, 2026
Viaarxiv icon

SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers

Add code
Feb 06, 2026
Viaarxiv icon

GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining

Add code
Jan 27, 2026
Viaarxiv icon

Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm

Add code
Aug 05, 2025
Viaarxiv icon

DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap

Add code
Mar 15, 2025
Figure 1 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 2 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 3 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 4 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Viaarxiv icon

The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning

Add code
Dec 23, 2024
Viaarxiv icon