Picture for Anwen Hu

Anwen Hu

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

Add code
Apr 25, 2024
Viaarxiv icon

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

Add code
Apr 23, 2024
Figure 1 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Figure 2 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Figure 3 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Figure 4 for Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Viaarxiv icon

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Add code
Mar 19, 2024
Figure 1 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 2 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 3 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 4 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Nov 30, 2023
Viaarxiv icon

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Add code
Nov 09, 2023
Viaarxiv icon

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Add code
Oct 08, 2023
Figure 1 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 2 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 3 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 4 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Viaarxiv icon

Explore and Tell: Embodied Visual Captioning in 3D Environments

Add code
Aug 21, 2023
Figure 1 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 2 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 3 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 4 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Viaarxiv icon

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Add code
Jul 04, 2023
Figure 1 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 2 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 3 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 4 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Viaarxiv icon

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Add code
Jun 27, 2023
Viaarxiv icon

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

Add code
Jun 07, 2023
Figure 1 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 2 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 3 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 4 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Viaarxiv icon