Picture for Yunheng Li

Yunheng Li

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

Add code
Mar 24, 2026
Viaarxiv icon

Unifying Heterogeneous Multi-Modal Remote Sensing Detection Via Language-Pivoted Pretraining

Add code
Mar 02, 2026
Viaarxiv icon

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

Add code
Feb 13, 2026
Viaarxiv icon

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs

Add code
Oct 02, 2025
Figure 1 for MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs
Figure 2 for MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs
Figure 3 for MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs
Figure 4 for MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs
Viaarxiv icon

Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment

Add code
Aug 12, 2025
Viaarxiv icon

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection

Add code
Dec 30, 2024
Viaarxiv icon

MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation

Add code
Dec 16, 2024
Figure 1 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Figure 2 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Figure 3 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Figure 4 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Viaarxiv icon

DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction

Add code
Dec 09, 2024
Figure 1 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Figure 2 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Figure 3 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Figure 4 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Viaarxiv icon

PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling

Add code
Nov 24, 2024
Viaarxiv icon

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Add code
Jun 02, 2024
Figure 1 for Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Figure 2 for Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Figure 3 for Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Figure 4 for Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Viaarxiv icon