LLaVA


LLaVA (Low Light Video Analysis) is a dataset and benchmark for low light video analysis tasks.

LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model

Add code
Aug 07, 2025
Viaarxiv icon

The Effect of Compression Techniques on Large Multimodal Language Models in the Medical Domain

Add code
Jul 29, 2025
Viaarxiv icon

TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model

Add code
Jul 28, 2025
Viaarxiv icon

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Add code
Jul 28, 2025
Viaarxiv icon

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

Add code
Jul 28, 2025
Viaarxiv icon

GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs

Add code
Jul 24, 2025
Viaarxiv icon

Controllable Hybrid Captioner for Improved Long-form Video Understanding

Add code
Jul 22, 2025
Viaarxiv icon

Automatic Fine-grained Segmentation-assisted Report Generation

Add code
Jul 22, 2025
Viaarxiv icon

Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models

Add code
Jul 22, 2025
Viaarxiv icon

Automating Steering for Safe Multimodal Large Language Models

Add code
Jul 17, 2025
Viaarxiv icon