Picture for Dongzhi Jiang

Dongzhi Jiang

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Viaarxiv icon

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Add code
Mar 21, 2024
Figure 1 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 2 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 3 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 4 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Viaarxiv icon

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

Add code
Apr 03, 2023
Figure 1 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Figure 2 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Figure 3 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Figure 4 for Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Viaarxiv icon