Picture for Zejun Li

Zejun Li

AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs

Add code
May 27, 2025
Viaarxiv icon

OViP: Online Vision-Language Preference Learning

Add code
May 21, 2025
Viaarxiv icon

Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference

Add code
Dec 17, 2024
Figure 1 for Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Figure 2 for Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Figure 3 for Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Figure 4 for Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Viaarxiv icon

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models

Add code
Jun 09, 2024
Viaarxiv icon

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Add code
May 28, 2024
Figure 1 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 2 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 3 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 4 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Viaarxiv icon

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Add code
Apr 02, 2024
Viaarxiv icon

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Add code
Oct 17, 2023
Figure 1 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 2 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 3 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 4 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Viaarxiv icon

A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training

Add code
Jun 11, 2022
Figure 1 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Figure 2 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Figure 3 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Figure 4 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Viaarxiv icon

MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment

Add code
Jan 29, 2022
Figure 1 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Figure 2 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Figure 3 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Figure 4 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Viaarxiv icon

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval

Add code
Nov 05, 2021
Figure 1 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Figure 2 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Figure 3 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Figure 4 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Viaarxiv icon