Picture for Yu Qiao

Yu Qiao

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Add code
Jun 12, 2024
Viaarxiv icon

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Add code
Jun 12, 2024
Figure 1 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 2 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 3 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 4 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

Needle In A Multimodal Haystack

Add code
Jun 11, 2024
Viaarxiv icon

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Add code
Jun 11, 2024
Figure 1 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Figure 2 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Figure 3 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Figure 4 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Viaarxiv icon

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Add code
Jun 11, 2024
Figure 1 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 2 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 3 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 4 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Viaarxiv icon

MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

Add code
Jun 09, 2024
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks

Add code
Jun 06, 2024
Figure 1 for Parameter-Inverted Image Pyramid Networks
Figure 2 for Parameter-Inverted Image Pyramid Networks
Figure 3 for Parameter-Inverted Image Pyramid Networks
Figure 4 for Parameter-Inverted Image Pyramid Networks
Viaarxiv icon

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Add code
Jun 06, 2024
Figure 1 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 2 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 3 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 4 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Viaarxiv icon

Learning 1D Causal Visual Representation with De-focus Attention Networks

Add code
Jun 06, 2024
Viaarxiv icon