Picture for Yongfei Liu

Yongfei Liu

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Add code
May 28, 2024
Viaarxiv icon

ViTAR: Vision Transformer with Any Resolution

Add code
Mar 28, 2024
Figure 1 for ViTAR: Vision Transformer with Any Resolution
Figure 2 for ViTAR: Vision Transformer with Any Resolution
Figure 3 for ViTAR: Vision Transformer with Any Resolution
Figure 4 for ViTAR: Vision Transformer with Any Resolution
Viaarxiv icon

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Mar 03, 2024
Figure 1 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 2 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 3 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 4 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

Add code
Dec 03, 2023
Viaarxiv icon

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Add code
Nov 28, 2023
Viaarxiv icon

Grounded Image Text Matching with Mismatched Relation Reasoning

Add code
Aug 04, 2023
Figure 1 for Grounded Image Text Matching with Mismatched Relation Reasoning
Figure 2 for Grounded Image Text Matching with Mismatched Relation Reasoning
Figure 3 for Grounded Image Text Matching with Mismatched Relation Reasoning
Figure 4 for Grounded Image Text Matching with Mismatched Relation Reasoning
Viaarxiv icon

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Add code
Mar 29, 2023
Figure 1 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 2 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 3 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 4 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Viaarxiv icon

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

Add code
Mar 02, 2023
Figure 1 for Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning
Figure 2 for Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning
Figure 3 for Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning
Figure 4 for Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning
Viaarxiv icon