Picture for Zhenbang Sun

Zhenbang Sun

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

Add code
Nov 18, 2025
Viaarxiv icon

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Add code
Nov 13, 2025
Figure 1 for ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Figure 2 for ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Figure 3 for ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Figure 4 for ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Viaarxiv icon

Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs

Add code
Mar 31, 2025
Viaarxiv icon

Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding

Add code
Jan 28, 2025
Figure 1 for Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding
Figure 2 for Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding
Figure 3 for Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding
Figure 4 for Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding
Viaarxiv icon

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

Add code
Nov 22, 2024
Viaarxiv icon

Frame-Voyager: Learning to Query Frames for Video Large Language Models

Add code
Oct 07, 2024
Viaarxiv icon

Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization

Add code
Sep 22, 2024
Viaarxiv icon

Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

Add code
Sep 08, 2023
Viaarxiv icon

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

Add code
Sep 05, 2023
Figure 1 for CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Figure 2 for CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Figure 3 for CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Figure 4 for CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Viaarxiv icon

HIRL: A General Framework for Hierarchical Image Representation Learning

Add code
May 26, 2022
Figure 1 for HIRL: A General Framework for Hierarchical Image Representation Learning
Figure 2 for HIRL: A General Framework for Hierarchical Image Representation Learning
Figure 3 for HIRL: A General Framework for Hierarchical Image Representation Learning
Figure 4 for HIRL: A General Framework for Hierarchical Image Representation Learning
Viaarxiv icon