Picture for Lewei Lu

Lewei Lu

Multimodal 3D Reasoning Segmentation with Complex Scenes

Add code
Nov 21, 2024
Viaarxiv icon

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Add code
Nov 15, 2024
Figure 1 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 2 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 3 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Figure 4 for Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Viaarxiv icon

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

Add code
Oct 21, 2024
Figure 1 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 2 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 3 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Figure 4 for Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Viaarxiv icon

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

Add code
Jul 22, 2024
Figure 1 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Figure 2 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Figure 3 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Figure 4 for MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Add code
Jun 12, 2024
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

Needle In A Multimodal Haystack

Add code
Jun 11, 2024
Viaarxiv icon

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Add code
Jun 11, 2024
Figure 1 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 2 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 3 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 4 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks

Add code
Jun 06, 2024
Figure 1 for Parameter-Inverted Image Pyramid Networks
Figure 2 for Parameter-Inverted Image Pyramid Networks
Figure 3 for Parameter-Inverted Image Pyramid Networks
Figure 4 for Parameter-Inverted Image Pyramid Networks
Viaarxiv icon