Picture for Yu Qiao

Yu Qiao

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Add code
Apr 29, 2024
Figure 1 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 2 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 3 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 4 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Viaarxiv icon

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Add code
Apr 24, 2024
Figure 1 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 2 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 3 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 4 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Viaarxiv icon

FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity

Add code
Apr 14, 2024
Figure 1 for FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity
Figure 2 for FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity
Figure 3 for FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity
Figure 4 for FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity
Viaarxiv icon

Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data

Add code
Apr 10, 2024
Figure 1 for Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data
Figure 2 for Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data
Figure 3 for Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data
Figure 4 for Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data
Viaarxiv icon

Are We on the Right Way for Evaluating Large Vision-Language Models?

Add code
Apr 09, 2024
Figure 1 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 2 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 3 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 4 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Viaarxiv icon

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Add code
Apr 09, 2024
Figure 1 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Figure 2 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Figure 3 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Figure 4 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Viaarxiv icon

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

Add code
Apr 03, 2024
Figure 1 for DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Figure 2 for DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Figure 3 for DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Figure 4 for DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Viaarxiv icon

Linear Attention Sequence Parallelism

Add code
Apr 03, 2024
Figure 1 for Linear Attention Sequence Parallelism
Figure 2 for Linear Attention Sequence Parallelism
Figure 3 for Linear Attention Sequence Parallelism
Figure 4 for Linear Attention Sequence Parallelism
Viaarxiv icon

VideoDistill: Language-aware Vision Distillation for Video Question Answering

Add code
Apr 01, 2024
Figure 1 for VideoDistill: Language-aware Vision Distillation for Video Question Answering
Figure 2 for VideoDistill: Language-aware Vision Distillation for Video Question Answering
Figure 3 for VideoDistill: Language-aware Vision Distillation for Video Question Answering
Figure 4 for VideoDistill: Language-aware Vision Distillation for Video Question Answering
Viaarxiv icon

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

Add code
Apr 01, 2024
Figure 1 for LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Figure 2 for LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Figure 3 for LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Figure 4 for LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Viaarxiv icon