Alert button
Picture for Haiyang Xu

Haiyang Xu

Alert button

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Add code
Bookmark button
Alert button
Mar 19, 2024
Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

Figure 1 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 2 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 3 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Figure 4 for mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Viaarxiv icon

Bayesian Diffusion Models for 3D Shape Reconstruction

Add code
Bookmark button
Alert button
Mar 11, 2024
Haiyang Xu, Yu Lei, Zeyuan Chen, Xiang Zhang, Yue Zhao, Yilin Wang, Zhuowen Tu

Figure 1 for Bayesian Diffusion Models for 3D Shape Reconstruction
Figure 2 for Bayesian Diffusion Models for 3D Shape Reconstruction
Figure 3 for Bayesian Diffusion Models for 3D Shape Reconstruction
Figure 4 for Bayesian Diffusion Models for 3D Shape Reconstruction
Viaarxiv icon

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

Add code
Bookmark button
Alert button
Mar 01, 2024
Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Figure 1 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 2 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 3 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 4 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Viaarxiv icon

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

Add code
Bookmark button
Alert button
Feb 26, 2024
Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Viaarxiv icon

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

Add code
Bookmark button
Alert button
Feb 24, 2024
Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang

Viaarxiv icon

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

Add code
Bookmark button
Alert button
Jan 29, 2024
Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

Viaarxiv icon

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

Add code
Bookmark button
Alert button
Dec 14, 2023
Chaoya Jiang, Wei ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

Viaarxiv icon

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Add code
Bookmark button
Alert button
Dec 13, 2023
Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

Figure 1 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Figure 2 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Figure 3 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Figure 4 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Bookmark button
Alert button
Nov 30, 2023
Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

Viaarxiv icon

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Add code
Bookmark button
Alert button
Nov 09, 2023
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Viaarxiv icon