Picture for Yunhang Shen

Yunhang Shen

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

Add code
Jun 27, 2024
Viaarxiv icon

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Add code
Apr 24, 2024
Figure 1 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 2 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 3 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 4 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Viaarxiv icon

Multi-Modal Prompt Learning on Blind Image Quality Assessment

Add code
Apr 23, 2024
Figure 1 for Multi-Modal Prompt Learning on Blind Image Quality Assessment
Figure 2 for Multi-Modal Prompt Learning on Blind Image Quality Assessment
Figure 3 for Multi-Modal Prompt Learning on Blind Image Quality Assessment
Figure 4 for Multi-Modal Prompt Learning on Blind Image Quality Assessment
Viaarxiv icon

Fusion-Mamba for Cross-modality Object Detection

Add code
Apr 14, 2024
Figure 1 for Fusion-Mamba for Cross-modality Object Detection
Figure 2 for Fusion-Mamba for Cross-modality Object Detection
Figure 3 for Fusion-Mamba for Cross-modality Object Detection
Figure 4 for Fusion-Mamba for Cross-modality Object Detection
Viaarxiv icon

A General and Efficient Training for Transformer via Token Expansion

Add code
Mar 31, 2024
Figure 1 for A General and Efficient Training for Transformer via Token Expansion
Figure 2 for A General and Efficient Training for Transformer via Token Expansion
Figure 3 for A General and Efficient Training for Transformer via Token Expansion
Figure 4 for A General and Efficient Training for Transformer via Token Expansion
Viaarxiv icon

Rethinking Centered Kernel Alignment in Knowledge Distillation

Add code
Jan 22, 2024
Viaarxiv icon

Feature Denoising Diffusion Model for Blind Image Quality Assessment

Add code
Jan 22, 2024
Figure 1 for Feature Denoising Diffusion Model for Blind Image Quality Assessment
Figure 2 for Feature Denoising Diffusion Model for Blind Image Quality Assessment
Figure 3 for Feature Denoising Diffusion Model for Blind Image Quality Assessment
Figure 4 for Feature Denoising Diffusion Model for Blind Image Quality Assessment
Viaarxiv icon

Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization

Add code
Jan 13, 2024
Viaarxiv icon