Picture for Pan Zhang

Pan Zhang

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Add code
Nov 09, 2024
Figure 1 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Figure 2 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Figure 3 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Figure 4 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Viaarxiv icon

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Add code
Oct 23, 2024
Figure 1 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 2 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 3 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 4 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Viaarxiv icon

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Add code
Oct 22, 2024
Viaarxiv icon

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Add code
Oct 21, 2024
Viaarxiv icon

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

Add code
Oct 09, 2024
Figure 1 for Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Figure 2 for Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Figure 3 for Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Figure 4 for Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Viaarxiv icon

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

Add code
Oct 08, 2024
Figure 1 for BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way
Figure 2 for BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way
Figure 3 for BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way
Figure 4 for BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way
Viaarxiv icon

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Add code
Jul 16, 2024
Figure 1 for VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Figure 2 for VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Figure 3 for VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Figure 4 for VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

Research on target detection method of distracted driving behavior based on improved YOLOv8

Add code
Jul 02, 2024
Figure 1 for Research on target detection method of distracted driving behavior based on improved YOLOv8
Figure 2 for Research on target detection method of distracted driving behavior based on improved YOLOv8
Figure 3 for Research on target detection method of distracted driving behavior based on improved YOLOv8
Viaarxiv icon

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Add code
Jul 01, 2024
Viaarxiv icon