Picture for Hanoona Rasheed

Hanoona Rasheed

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Add code
Jun 13, 2024
Viaarxiv icon

PALO: A Polyglot Large Multimodal Model for 5B People

Add code
Mar 05, 2024
Figure 1 for PALO: A Polyglot Large Multimodal Model for 5B People
Figure 2 for PALO: A Polyglot Large Multimodal Model for 5B People
Figure 3 for PALO: A Polyglot Large Multimodal Model for 5B People
Figure 4 for PALO: A Polyglot Large Multimodal Model for 5B People
Viaarxiv icon

GLaMM: Pixel Grounding Large Multimodal Model

Add code
Nov 06, 2023
Viaarxiv icon

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Add code
Jun 08, 2023
Figure 1 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Figure 2 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Figure 3 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Figure 4 for Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Viaarxiv icon

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Add code
Mar 27, 2023
Figure 1 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Figure 2 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Figure 3 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Figure 4 for SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Viaarxiv icon

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

Add code
Dec 08, 2022
Figure 1 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Figure 2 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Figure 3 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Figure 4 for UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Viaarxiv icon

Fine-tuned CLIP Models are Efficient Video Learners

Add code
Dec 06, 2022
Figure 1 for Fine-tuned CLIP Models are Efficient Video Learners
Figure 2 for Fine-tuned CLIP Models are Efficient Video Learners
Figure 3 for Fine-tuned CLIP Models are Efficient Video Learners
Figure 4 for Fine-tuned CLIP Models are Efficient Video Learners
Viaarxiv icon

MaPLe: Multi-modal Prompt Learning

Add code
Oct 06, 2022
Figure 1 for MaPLe: Multi-modal Prompt Learning
Figure 2 for MaPLe: Multi-modal Prompt Learning
Figure 3 for MaPLe: Multi-modal Prompt Learning
Figure 4 for MaPLe: Multi-modal Prompt Learning
Viaarxiv icon

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Add code
Jul 07, 2022
Figure 1 for Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
Figure 2 for Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
Figure 3 for Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
Figure 4 for Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
Viaarxiv icon

Multi-modal Transformers Excel at Class-agnostic Object Detection

Add code
Nov 22, 2021
Figure 1 for Multi-modal Transformers Excel at Class-agnostic Object Detection
Figure 2 for Multi-modal Transformers Excel at Class-agnostic Object Detection
Figure 3 for Multi-modal Transformers Excel at Class-agnostic Object Detection
Figure 4 for Multi-modal Transformers Excel at Class-agnostic Object Detection
Viaarxiv icon