Picture for Qingyun Li

Qingyun Li

InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition

Add code
May 21, 2025
Viaarxiv icon

Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

Add code
Mar 06, 2025
Viaarxiv icon

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Add code
Feb 11, 2025
Viaarxiv icon

PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection

Add code
Jan 23, 2025
Figure 1 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 2 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 3 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 4 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Viaarxiv icon

A Simple Aerial Detection Baseline of Multimodal Language Models

Add code
Jan 16, 2025
Figure 1 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 2 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 3 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 4 for A Simple Aerial Detection Baseline of Multimodal Language Models
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

FLoRA: Low-Rank Core Space for N-dimension

Add code
May 23, 2024
Viaarxiv icon

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Add code
Feb 29, 2024
Figure 1 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 2 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 3 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 4 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Viaarxiv icon