Llava


LLaVA (Low Light Video Analysis) is a dataset and benchmark for low light video analysis tasks.

VGR: Visual Grounded Reasoning

Add code
Jun 16, 2025
Viaarxiv icon

Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency

Add code
Jun 15, 2025
Viaarxiv icon

Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models

Add code
Jun 15, 2025
Viaarxiv icon

Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs

Add code
Jun 13, 2025
Viaarxiv icon

Quizzard@INOVA Challenge 2025 -- Track A: Plug-and-Play Technique in Interleaved Multi-Image Model

Add code
Jun 13, 2025
Viaarxiv icon

Can Sound Replace Vision in LLaVA With Token Substitution?

Add code
Jun 12, 2025
Viaarxiv icon

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Add code
Jun 12, 2025
Viaarxiv icon

LLaVA-c: Continual Improved Visual Instruction Tuning

Add code
Jun 10, 2025
Viaarxiv icon

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

Add code
Jun 10, 2025
Viaarxiv icon

Learning Compact Vision Tokens for Efficient Large Multimodal Models

Add code
Jun 08, 2025
Viaarxiv icon