Alert button
Picture for Anwen Hu

Anwen Hu

Alert button

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Add code
Bookmark button
Alert button
Mar 19, 2024
Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

Figure 1 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 2 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 3 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 4 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Bookmark button
Alert button
Nov 30, 2023
Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

Viaarxiv icon

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Add code
Bookmark button
Alert button
Nov 09, 2023
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Viaarxiv icon

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Add code
Bookmark button
Alert button
Oct 08, 2023
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang

Figure 1 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 2 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 3 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 4 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Viaarxiv icon

Explore and Tell: Embodied Visual Captioning in 3D Environments

Add code
Bookmark button
Alert button
Aug 21, 2023
Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin

Figure 1 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 2 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 3 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 4 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Viaarxiv icon

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Add code
Bookmark button
Alert button
Jul 04, 2023
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

Figure 1 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 2 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 3 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Figure 4 for mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Viaarxiv icon

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Add code
Bookmark button
Alert button
Jun 27, 2023
Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin

Viaarxiv icon

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

Add code
Bookmark button
Alert button
Jun 07, 2023
Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

Figure 1 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 2 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 3 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Figure 4 for Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Viaarxiv icon

Movie101: A New Movie Understanding Benchmark

Add code
Bookmark button
Alert button
May 20, 2023
Zihao Yue, Qi Zhang, Anwen Hu, Liang Zhang, Ziheng Wang, Qin Jin

Figure 1 for Movie101: A New Movie Understanding Benchmark
Figure 2 for Movie101: A New Movie Understanding Benchmark
Figure 3 for Movie101: A New Movie Understanding Benchmark
Figure 4 for Movie101: A New Movie Understanding Benchmark
Viaarxiv icon