Picture for Yunhang Shen

Yunhang Shen

Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities?

Add code
Feb 27, 2026
Viaarxiv icon

Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation

Add code
Feb 13, 2026
Viaarxiv icon

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Add code
Jan 27, 2026
Viaarxiv icon

SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM

Add code
Nov 18, 2025
Viaarxiv icon

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Add code
Oct 10, 2025
Viaarxiv icon

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

Add code
May 28, 2025
Viaarxiv icon

What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation

Add code
May 26, 2025
Figure 1 for What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
Figure 2 for What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
Figure 3 for What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
Figure 4 for What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation
Viaarxiv icon

Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation

Add code
May 16, 2025
Viaarxiv icon

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Add code
May 06, 2025
Viaarxiv icon

BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution

Add code
Apr 04, 2025
Viaarxiv icon