Picture for Muhammad Maaz

Muhammad Maaz

Faculty of Health Sciences, McMaster University

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Add code
Jun 13, 2024
Viaarxiv icon

PALO: A Polyglot Large Multimodal Model for 5B People

Add code
Mar 05, 2024
Figure 1 for PALO: A Polyglot Large Multimodal Model for 5B People
Figure 2 for PALO: A Polyglot Large Multimodal Model for 5B People
Figure 3 for PALO: A Polyglot Large Multimodal Model for 5B People
Figure 4 for PALO: A Polyglot Large Multimodal Model for 5B People
Viaarxiv icon

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

Add code
Nov 22, 2023
Figure 1 for PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Figure 2 for PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Figure 3 for PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Figure 4 for PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Viaarxiv icon

GLaMM: Pixel Grounding Large Multimodal Model

Add code
Nov 06, 2023
Viaarxiv icon

On Orderings of Probability Vectors and Unsupervised Performance Estimation

Add code
Jun 16, 2023
Figure 1 for On Orderings of Probability Vectors and Unsupervised Performance Estimation
Figure 2 for On Orderings of Probability Vectors and Unsupervised Performance Estimation
Figure 3 for On Orderings of Probability Vectors and Unsupervised Performance Estimation
Figure 4 for On Orderings of Probability Vectors and Unsupervised Performance Estimation
Viaarxiv icon

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Add code
Jun 08, 2023
Figure 1 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Figure 2 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Figure 3 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Figure 4 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Viaarxiv icon

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Add code
Mar 27, 2023
Figure 1 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Figure 2 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Figure 3 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Figure 4 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Viaarxiv icon

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

Add code
Dec 08, 2022
Figure 1 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Figure 2 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Figure 3 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Figure 4 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Viaarxiv icon

Fine-tuned CLIP Models are Efficient Video Learners

Add code
Dec 06, 2022
Figure 1 for Fine-tuned CLIP Models are Efficient Video Learners
Figure 2 for Fine-tuned CLIP Models are Efficient Video Learners
Figure 3 for Fine-tuned CLIP Models are Efficient Video Learners
Figure 4 for Fine-tuned CLIP Models are Efficient Video Learners
Viaarxiv icon

MaPLe: Multi-modal Prompt Learning

Add code
Oct 06, 2022
Figure 1 for MaPLe: Multi-modal Prompt Learning
Figure 2 for MaPLe: Multi-modal Prompt Learning
Figure 3 for MaPLe: Multi-modal Prompt Learning
Figure 4 for MaPLe: Multi-modal Prompt Learning
Viaarxiv icon