Picture for Guanzhou Chen

Guanzhou Chen

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Add code
Mar 26, 2026
Viaarxiv icon

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

Add code
Mar 21, 2026
Viaarxiv icon

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Add code
Mar 10, 2026
Viaarxiv icon

STARS: Shared-specific Translation and Alignment for missing-modality Remote Sensing Semantic Segmentation

Add code
Jan 24, 2026
Viaarxiv icon

Sat2Flow: A Structure-Aware Diffusion Framework for Human Flow Generation from Satellite Imagery

Add code
Aug 27, 2025
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning

Add code
Jun 11, 2025
Figure 1 for MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning
Figure 2 for MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning
Figure 3 for MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning
Figure 4 for MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning
Viaarxiv icon

BFA-YOLO: Balanced multiscale object detection network for multi-view building facade attachments detection

Add code
Sep 06, 2024
Figure 1 for BFA-YOLO: Balanced multiscale object detection network for multi-view building facade attachments detection
Figure 2 for BFA-YOLO: Balanced multiscale object detection network for multi-view building facade attachments detection
Figure 3 for BFA-YOLO: Balanced multiscale object detection network for multi-view building facade attachments detection
Figure 4 for BFA-YOLO: Balanced multiscale object detection network for multi-view building facade attachments detection
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon