Picture for Qingyun Li

Qingyun Li

Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use

Add code
Sep 16, 2025
Viaarxiv icon

GLEAM: Learning to Match and Explain in Cross-View Geo-Localization

Add code
Sep 09, 2025
Viaarxiv icon

InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition

Add code
May 21, 2025
Viaarxiv icon

Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

Add code
Mar 06, 2025
Viaarxiv icon

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Add code
Feb 11, 2025
Viaarxiv icon

PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection

Add code
Jan 23, 2025
Figure 1 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 2 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 3 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 4 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Viaarxiv icon

A Simple Aerial Detection Baseline of Multimodal Language Models

Add code
Jan 16, 2025
Figure 1 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 2 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 3 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 4 for A Simple Aerial Detection Baseline of Multimodal Language Models
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon