Picture for Xizhou Zhu

Xizhou Zhu

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Add code
Apr 29, 2024
Figure 1 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 2 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 3 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 4 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Viaarxiv icon

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Add code
Mar 07, 2024
Viaarxiv icon

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Add code
Feb 29, 2024
Figure 1 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 2 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 3 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 4 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Viaarxiv icon

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Add code
Jan 18, 2024
Figure 1 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Figure 2 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Figure 3 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Figure 4 for MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Viaarxiv icon

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

Add code
Jan 16, 2024
Viaarxiv icon

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Add code
Jan 15, 2024
Viaarxiv icon

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

Add code
Jan 11, 2024
Figure 1 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Figure 2 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Figure 3 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Figure 4 for Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Viaarxiv icon

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Add code
Dec 25, 2023
Figure 1 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Figure 2 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Figure 3 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Figure 4 for DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Viaarxiv icon

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Add code
Dec 14, 2023
Figure 1 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 2 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 3 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 4 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Viaarxiv icon

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Add code
Oct 30, 2023
Figure 1 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 2 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 3 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 4 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Viaarxiv icon