Picture for Muhammad Ferjad Naeem

Muhammad Ferjad Naeem

Toward a Diffusion-Based Generalist for Dense Vision Tasks

Add code
Jun 29, 2024
Figure 1 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 2 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 3 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 4 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Viaarxiv icon

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

Add code
May 08, 2024
Figure 1 for How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Figure 2 for How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Figure 3 for How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Figure 4 for How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Viaarxiv icon

Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

Add code
May 06, 2024
Figure 1 for Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Figure 2 for Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Figure 3 for Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Figure 4 for Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Viaarxiv icon

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Add code
Mar 14, 2024
Figure 1 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Figure 2 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Figure 3 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Figure 4 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Viaarxiv icon

FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

Add code
Mar 11, 2024
Figure 1 for FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Figure 2 for FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Figure 3 for FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Figure 4 for FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Viaarxiv icon

Learning to Prompt with Text Only Supervision for Vision-Language Models

Add code
Jan 04, 2024
Viaarxiv icon

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

Add code
Nov 27, 2023
Figure 1 for SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Figure 2 for SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Figure 3 for SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Figure 4 for SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Viaarxiv icon

SILC: Improving Vision Language Pretraining with Self-Distillation

Add code
Oct 20, 2023
Figure 1 for SILC: Improving Vision Language Pretraining with Self-Distillation
Figure 2 for SILC: Improving Vision Language Pretraining with Self-Distillation
Figure 3 for SILC: Improving Vision Language Pretraining with Self-Distillation
Figure 4 for SILC: Improving Vision Language Pretraining with Self-Distillation
Viaarxiv icon

Introducing Language Guidance in Prompt-based Continual Learning

Add code
Aug 30, 2023
Figure 1 for Introducing Language Guidance in Prompt-based Continual Learning
Figure 2 for Introducing Language Guidance in Prompt-based Continual Learning
Figure 3 for Introducing Language Guidance in Prompt-based Continual Learning
Figure 4 for Introducing Language Guidance in Prompt-based Continual Learning
Viaarxiv icon

I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification

Add code
Dec 05, 2022
Figure 1 for I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Figure 2 for I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Figure 3 for I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Figure 4 for I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Viaarxiv icon