Picture for Shentong Mo

Shentong Mo

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

Add code
Mar 09, 2026
Viaarxiv icon

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Add code
Feb 26, 2026
Viaarxiv icon

GMAIL: Generative Modality Alignment for generated Image Learning

Add code
Feb 17, 2026
Viaarxiv icon

SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers

Add code
Feb 06, 2026
Viaarxiv icon

GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining

Add code
Jan 27, 2026
Viaarxiv icon

Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm

Add code
Aug 05, 2025
Viaarxiv icon

DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap

Add code
Mar 15, 2025
Figure 1 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 2 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 3 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 4 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Viaarxiv icon

The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning

Add code
Dec 23, 2024
Viaarxiv icon

Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Add code
Dec 17, 2024
Figure 1 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 2 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 3 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 4 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Viaarxiv icon

Continual Audio-Visual Sound Separation

Add code
Nov 05, 2024
Figure 1 for Continual Audio-Visual Sound Separation
Figure 2 for Continual Audio-Visual Sound Separation
Figure 3 for Continual Audio-Visual Sound Separation
Figure 4 for Continual Audio-Visual Sound Separation
Viaarxiv icon