Picture for Botian Shi

Botian Shi

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Add code
Jun 17, 2024
Figure 1 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 2 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 3 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 4 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

Add code
May 24, 2024
Figure 1 for Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Figure 2 for Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Figure 3 for Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Figure 4 for Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Viaarxiv icon

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

Add code
May 06, 2024
Viaarxiv icon

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Add code
Apr 29, 2024
Figure 1 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 2 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 3 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Figure 4 for How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Viaarxiv icon

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Add code
Apr 23, 2024
Viaarxiv icon

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Add code
Feb 19, 2024
Viaarxiv icon

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

Add code
Feb 06, 2024
Viaarxiv icon

LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

Add code
Feb 02, 2024
Figure 1 for LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Figure 2 for LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Figure 3 for LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Figure 4 for LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Viaarxiv icon