Picture for Yunhang Shen

Yunhang Shen

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Figure 1 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 2 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 3 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 4 for Aligning Multimodal LLM with Human Preference: A Survey
Viaarxiv icon

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?

Add code
Mar 10, 2025
Viaarxiv icon

Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery

Add code
Feb 09, 2025
Figure 1 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 2 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 3 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Figure 4 for Training-free Anomaly Event Detection via LLM-guided Symbolic Pattern Discovery
Viaarxiv icon

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

Add code
Feb 07, 2025
Viaarxiv icon

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Add code
Jan 27, 2025
Figure 1 for LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
Figure 2 for LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
Figure 3 for LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
Figure 4 for LUCY: Linguistic Understanding and Control Yielding Early Stage of Her
Viaarxiv icon

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Add code
Jan 09, 2025
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Figure 1 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Figure 2 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Figure 3 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Figure 4 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Viaarxiv icon

Probability-density-aware Semi-supervised Learning

Add code
Dec 23, 2024
Viaarxiv icon

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

Add code
Dec 12, 2024
Figure 1 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Figure 2 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Figure 3 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Figure 4 for Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration
Viaarxiv icon

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Add code
Dec 05, 2024
Figure 1 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 2 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 3 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Figure 4 for FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
Viaarxiv icon