Alert button
Picture for Qinghao Ye

Qinghao Ye

Alert button

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

Add code
Bookmark button
Alert button
Mar 01, 2024
Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Figure 1 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 2 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 3 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Figure 4 for Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Viaarxiv icon

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

Add code
Bookmark button
Alert button
Feb 26, 2024
Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Viaarxiv icon

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

Add code
Bookmark button
Alert button
Dec 14, 2023
Chaoya Jiang, Wei ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

Viaarxiv icon

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Add code
Bookmark button
Alert button
Dec 13, 2023
Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

Figure 1 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Figure 2 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Figure 3 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Figure 4 for Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Bookmark button
Alert button
Nov 30, 2023
Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

Viaarxiv icon

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Add code
Bookmark button
Alert button
Nov 09, 2023
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Viaarxiv icon

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Add code
Bookmark button
Alert button
Oct 08, 2023
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang

Figure 1 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 2 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 3 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Figure 4 for UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Viaarxiv icon

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Add code
Bookmark button
Alert button
Aug 29, 2023
Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang

Figure 1 for Evaluation and Analysis of Hallucination in Large Vision-Language Models
Figure 2 for Evaluation and Analysis of Hallucination in Large Vision-Language Models
Figure 3 for Evaluation and Analysis of Hallucination in Large Vision-Language Models
Figure 4 for Evaluation and Analysis of Hallucination in Large Vision-Language Models
Viaarxiv icon

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Add code
Bookmark button
Alert button
Jul 17, 2023
Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang

Figure 1 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Figure 2 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Figure 3 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Figure 4 for BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Viaarxiv icon