Picture for Yu Qiao

Yu Qiao

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

SyncVIS: Synchronized Video Instance Segmentation

Add code
Dec 01, 2024
Figure 1 for SyncVIS: Synchronized Video Instance Segmentation
Figure 2 for SyncVIS: Synchronized Video Instance Segmentation
Figure 3 for SyncVIS: Synchronized Video Instance Segmentation
Figure 4 for SyncVIS: Synchronized Video Instance Segmentation
Viaarxiv icon

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Add code
Dec 01, 2024
Viaarxiv icon

OASIS: Open Agent Social Interaction Simulations with One Million Agents

Add code
Nov 26, 2024
Figure 1 for OASIS: Open Agent Social Interaction Simulations with One Million Agents
Figure 2 for OASIS: Open Agent Social Interaction Simulations with One Million Agents
Figure 3 for OASIS: Open Agent Social Interaction Simulations with One Million Agents
Figure 4 for OASIS: Open Agent Social Interaction Simulations with One Million Agents
Viaarxiv icon

OASIS: Open Agents Social Interaction Simulations on One Million Agents

Add code
Nov 21, 2024
Figure 1 for OASIS: Open Agents Social Interaction Simulations on One Million Agents
Figure 2 for OASIS: Open Agents Social Interaction Simulations on One Million Agents
Figure 3 for OASIS: Open Agents Social Interaction Simulations on One Million Agents
Figure 4 for OASIS: Open Agents Social Interaction Simulations on One Million Agents
Viaarxiv icon

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Add code
Nov 21, 2024
Viaarxiv icon

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Add code
Nov 20, 2024
Viaarxiv icon

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Add code
Nov 16, 2024
Figure 1 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Figure 2 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Figure 3 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Figure 4 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Viaarxiv icon

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Add code
Nov 15, 2024
Figure 1 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 2 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 3 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 4 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Viaarxiv icon

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving

Add code
Nov 08, 2024
Figure 1 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 2 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 3 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 4 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Viaarxiv icon