Picture for Xiaotian Han

Xiaotian Han

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Add code
May 28, 2024
Viaarxiv icon

ViTAR: Vision Transformer with Any Resolution

Add code
Mar 28, 2024
Figure 1 for ViTAR: Vision Transformer with Any Resolution
Figure 2 for ViTAR: Vision Transformer with Any Resolution
Figure 3 for ViTAR: Vision Transformer with Any Resolution
Figure 4 for ViTAR: Vision Transformer with Any Resolution
Viaarxiv icon

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Mar 03, 2024
Figure 1 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 2 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 3 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 4 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Figure 1 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 2 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 3 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Figure 4 for Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Viaarxiv icon

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

Add code
Jan 17, 2024
Viaarxiv icon

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Add code
Jan 02, 2024
Viaarxiv icon

PokeMQA: Programmable knowledge editing for Multi-hop Question Answering

Add code
Dec 23, 2023
Figure 1 for PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
Figure 2 for PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
Figure 3 for PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
Figure 4 for PokeMQA: Programmable knowledge editing for Multi-hop Question Answering
Viaarxiv icon

Chasing Fairness in Graphs: A GNN Architecture Perspective

Add code
Dec 19, 2023
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

Marginal Nodes Matter: Towards Structure Fairness in Graphs

Add code
Oct 23, 2023
Figure 1 for Marginal Nodes Matter: Towards Structure Fairness in Graphs
Figure 2 for Marginal Nodes Matter: Towards Structure Fairness in Graphs
Figure 3 for Marginal Nodes Matter: Towards Structure Fairness in Graphs
Figure 4 for Marginal Nodes Matter: Towards Structure Fairness in Graphs
Viaarxiv icon