Picture for Rongrong Ji

Rongrong Ji

Xiamen University, Peng Cheng Laboratory

VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference

Add code
Aug 25, 2025
Viaarxiv icon

DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation

Add code
Aug 06, 2025
Figure 1 for DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation
Figure 2 for DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation
Figure 3 for DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation
Figure 4 for DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation
Viaarxiv icon

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

Add code
Aug 01, 2025
Viaarxiv icon

Towards Universal Modal Tracking with Online Dense Temporal Token Learning

Add code
Jul 27, 2025
Figure 1 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Figure 2 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Figure 3 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Figure 4 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Viaarxiv icon

GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models

Add code
Jul 16, 2025
Viaarxiv icon

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Add code
Jul 03, 2025
Figure 1 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Figure 2 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Figure 3 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Figure 4 for AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Viaarxiv icon

DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion

Add code
Jun 26, 2025
Viaarxiv icon

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence

Add code
Jun 09, 2025
Viaarxiv icon

Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective

Add code
May 28, 2025
Figure 1 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Figure 2 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Figure 3 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Figure 4 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Viaarxiv icon

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

Add code
May 28, 2025
Viaarxiv icon