Picture for Salman Khan

Salman Khan

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Add code
Apr 17, 2025
Viaarxiv icon

Deep Learning in Concealed Dense Prediction

Add code
Apr 15, 2025
Viaarxiv icon

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

Add code
Mar 29, 2025
Viaarxiv icon

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Add code
Mar 27, 2025
Viaarxiv icon

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

Add code
Mar 18, 2025
Viaarxiv icon

How Good is my Histopathology Vision-Language Foundation Model? A Holistic Benchmark

Add code
Mar 17, 2025
Viaarxiv icon

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

Add code
Mar 15, 2025
Viaarxiv icon

Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology

Add code
Mar 13, 2025
Viaarxiv icon

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Add code
Mar 13, 2025
Viaarxiv icon

Handwritten Digit Recognition: An Ensemble-Based Approach for Superior Performance

Add code
Mar 08, 2025
Viaarxiv icon