Picture for Sihan Yang

Sihan Yang

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Add code
May 29, 2025
Viaarxiv icon

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Add code
May 28, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Add code
May 20, 2025
Viaarxiv icon

Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization

Add code
Mar 17, 2025
Viaarxiv icon

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

Add code
Nov 18, 2024
Viaarxiv icon

SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

Add code
Aug 19, 2024
Viaarxiv icon

Calibrated Self-Rewarding Vision Language Models

Add code
May 23, 2024
Viaarxiv icon