Picture for Qinghao Ye

Qinghao Ye

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Viaarxiv icon

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

Add code
Mar 01, 2024
Viaarxiv icon

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

Add code
Feb 26, 2024
Viaarxiv icon

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

Add code
Dec 14, 2023
Viaarxiv icon

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Add code
Dec 13, 2023
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Nov 30, 2023
Viaarxiv icon

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Add code
Nov 09, 2023
Viaarxiv icon

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

Add code
Oct 08, 2023
Viaarxiv icon

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Add code
Aug 29, 2023
Viaarxiv icon

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Add code
Jul 17, 2023
Viaarxiv icon