Alert button
Picture for Jiabo Ye

Jiabo Ye

Alert button

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Add code
Bookmark button
Alert button
Mar 19, 2024
Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

Figure 1 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 2 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 3 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 4 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Viaarxiv icon

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

Add code
Bookmark button
Alert button
Jan 29, 2024
Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Bookmark button
Alert button
Nov 30, 2023
Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

Viaarxiv icon

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Add code
Bookmark button
Alert button
Nov 09, 2023
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Viaarxiv icon

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Add code
Bookmark button
Alert button
Oct 08, 2023
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang

Figure 1 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 2 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 3 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 4 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Viaarxiv icon

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Add code
Bookmark button
Alert button
Jul 04, 2023
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

Figure 1 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 2 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 3 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 4 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Viaarxiv icon

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

Add code
Bookmark button
Alert button
Jun 07, 2023
Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

Figure 1 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 2 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 3 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 4 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Viaarxiv icon

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Add code
Bookmark button
Alert button
Apr 27, 2023
Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

Figure 1 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 2 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 3 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Figure 4 for mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Viaarxiv icon

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Add code
Bookmark button
Alert button
Feb 01, 2023
Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou

Figure 1 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 2 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 3 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Figure 4 for mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Viaarxiv icon