Picture for Quanzeng You

Quanzeng You

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Add code
May 28, 2024
Viaarxiv icon

ViTAR: Vision Transformer with Any Resolution

Add code
Mar 28, 2024
Figure 1 for ViTAR: Vision Transformer with Any Resolution
Figure 2 for ViTAR: Vision Transformer with Any Resolution
Figure 3 for ViTAR: Vision Transformer with Any Resolution
Figure 4 for ViTAR: Vision Transformer with Any Resolution
Viaarxiv icon

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Mar 03, 2024
Figure 1 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 2 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 3 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 4 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Viaarxiv icon

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

Add code
Jan 17, 2024
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

Add code
Dec 03, 2023
Viaarxiv icon

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Add code
Nov 28, 2023
Viaarxiv icon

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

Add code
Oct 10, 2023
Figure 1 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Figure 2 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Figure 3 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Figure 4 for Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Viaarxiv icon

RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

Add code
Jun 07, 2023
Figure 1 for RefineVIS: Video Instance Segmentation with Temporal Attention Refinement
Figure 2 for RefineVIS: Video Instance Segmentation with Temporal Attention Refinement
Figure 3 for RefineVIS: Video Instance Segmentation with Temporal Attention Refinement
Figure 4 for RefineVIS: Video Instance Segmentation with Temporal Attention Refinement
Viaarxiv icon