Picture for Renrui Zhang

Renrui Zhang

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Figure 1 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 2 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 3 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 4 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Viaarxiv icon

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Add code
Mar 21, 2024
Figure 1 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 2 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 3 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 4 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Viaarxiv icon

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Add code
Mar 14, 2024
Figure 1 for OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Figure 2 for OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Figure 3 for OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Figure 4 for OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

Language-Assisted 3D Scene Understanding

Add code
Dec 31, 2023
Figure 1 for Language-Assisted 3D Scene Understanding
Figure 2 for Language-Assisted 3D Scene Understanding
Figure 3 for Language-Assisted 3D Scene Understanding
Figure 4 for Language-Assisted 3D Scene Understanding
Viaarxiv icon

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

Add code
Dec 24, 2023
Figure 1 for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Figure 2 for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Figure 3 for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Figure 4 for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Figure 1 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 2 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 3 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 4 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Viaarxiv icon

Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

Add code
Dec 19, 2023
Figure 1 for Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Figure 2 for Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Figure 3 for Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Figure 4 for Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Viaarxiv icon

Gradient-based Parameter Selection for Efficient Fine-Tuning

Add code
Dec 15, 2023
Viaarxiv icon

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V

Add code
Dec 15, 2023
Viaarxiv icon