Picture for Qingpei Guo

Qingpei Guo

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Add code
Jun 11, 2025
Viaarxiv icon

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models

Add code
May 28, 2025
Viaarxiv icon

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Add code
May 05, 2025
Viaarxiv icon

From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval

Add code
Apr 25, 2025
Viaarxiv icon

LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models

Add code
Mar 27, 2025
Viaarxiv icon

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Add code
Mar 11, 2025
Viaarxiv icon

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

Add code
Feb 26, 2025
Viaarxiv icon

Referencing Where to Focus: Improving VisualGrounding with Referential Query

Add code
Dec 26, 2024
Figure 1 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Figure 2 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Figure 3 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Figure 4 for Referencing Where to Focus: Improving VisualGrounding with Referential Query
Viaarxiv icon

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Add code
Nov 19, 2024
Viaarxiv icon

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

Add code
Oct 07, 2024
Figure 1 for LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Figure 2 for LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Figure 3 for LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Figure 4 for LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Viaarxiv icon