Picture for Yunhang Shen

Yunhang Shen

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

Add code
May 28, 2025
Viaarxiv icon

What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation

Add code
May 26, 2025
Viaarxiv icon

Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation

Add code
May 16, 2025
Viaarxiv icon

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Add code
May 06, 2025
Viaarxiv icon

BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution

Add code
Apr 04, 2025
Viaarxiv icon

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Viaarxiv icon

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?

Add code
Mar 10, 2025
Viaarxiv icon

Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery

Add code
Feb 09, 2025
Figure 1 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 2 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 3 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 4 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Viaarxiv icon

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

Add code
Feb 07, 2025
Viaarxiv icon

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Add code
Jan 27, 2025
Viaarxiv icon