Picture for Bohan Zhai

Bohan Zhai

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Add code
Mar 03, 2024
Figure 1 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 2 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 3 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Figure 4 for InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Viaarxiv icon

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Add code
Jan 18, 2024
Viaarxiv icon

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

Add code
Jan 17, 2024
Viaarxiv icon

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Add code
Dec 04, 2023
Figure 1 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 2 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 3 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Figure 4 for InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Viaarxiv icon

HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption

Add code
Oct 03, 2023
Figure 1 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Figure 2 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Figure 3 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Figure 4 for HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption
Viaarxiv icon

Multitask Vision-Language Prompt Tuning

Add code
Dec 05, 2022
Figure 1 for Multitask Vision-Language Prompt Tuning
Figure 2 for Multitask Vision-Language Prompt Tuning
Figure 3 for Multitask Vision-Language Prompt Tuning
Figure 4 for Multitask Vision-Language Prompt Tuning
Viaarxiv icon

Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets

Add code
Jun 08, 2021
Figure 1 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Figure 2 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Figure 3 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Figure 4 for Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
Viaarxiv icon

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

Add code
Mar 31, 2021
Figure 1 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 2 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 3 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 4 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Viaarxiv icon

You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module

Add code
Mar 24, 2021
Figure 1 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Figure 2 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Figure 3 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Figure 4 for You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module
Viaarxiv icon