Picture for Shentong Mo

Shentong Mo

Rethinking Positive Pairs in Contrastive Learning

Add code
Oct 23, 2024
Figure 1 for Rethinking Positive Pairs in Contrastive Learning
Figure 2 for Rethinking Positive Pairs in Contrastive Learning
Figure 3 for Rethinking Positive Pairs in Contrastive Learning
Figure 4 for Rethinking Positive Pairs in Contrastive Learning
Viaarxiv icon

Multi-scale Multi-instance Visual Sound Localization and Segmentation

Add code
Aug 31, 2024
Viaarxiv icon

MultiMed: Massively Multimodal and Multitask Medical Understanding

Add code
Aug 22, 2024
Figure 1 for MultiMed: Massively Multimodal and Multitask Medical Understanding
Figure 2 for MultiMed: Massively Multimodal and Multitask Medical Understanding
Figure 3 for MultiMed: Massively Multimodal and Multitask Medical Understanding
Figure 4 for MultiMed: Massively Multimodal and Multitask Medical Understanding
Viaarxiv icon

IoT-LM: Large Multisensory Language Models for the Internet of Things

Add code
Jul 13, 2024
Figure 1 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 2 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 3 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 4 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Viaarxiv icon

Semantic Grouping Network for Audio Source Separation

Add code
Jul 04, 2024
Figure 1 for Semantic Grouping Network for Audio Source Separation
Figure 2 for Semantic Grouping Network for Audio Source Separation
Figure 3 for Semantic Grouping Network for Audio Source Separation
Figure 4 for Semantic Grouping Network for Audio Source Separation
Viaarxiv icon

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

Add code
Jun 07, 2024
Figure 1 for MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Figure 2 for MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Figure 3 for MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Figure 4 for MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Viaarxiv icon

Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs

Add code
Jun 07, 2024
Viaarxiv icon

DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture

Add code
May 28, 2024
Figure 1 for DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
Figure 2 for DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
Figure 3 for DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
Figure 4 for DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
Viaarxiv icon

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

Add code
May 24, 2024
Viaarxiv icon

Unified Video-Language Pre-training with Synchronized Audio

Add code
May 12, 2024
Viaarxiv icon