Picture for Hartwig Adam

Hartwig Adam

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Dec 21, 2023
Figure 1 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 2 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 3 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 4 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Viaarxiv icon

PolyMaX: General Dense Prediction with Mask Transformer

Add code
Nov 09, 2023
Figure 1 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 2 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 3 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 4 for PolyMaX: General Dense Prediction with Mask Transformer
Viaarxiv icon

SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset

Add code
Sep 21, 2023
Figure 1 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Figure 2 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Figure 3 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Figure 4 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Viaarxiv icon

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Add code
Jul 06, 2023
Figure 1 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 2 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 3 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 4 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Viaarxiv icon

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Add code
May 10, 2023
Figure 1 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 2 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 3 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 4 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Viaarxiv icon

Unified Visual Relationship Detection with Vision and Language Models

Add code
Mar 16, 2023
Figure 1 for Unified Visual Relationship Detection with Vision and Language Models
Figure 2 for Unified Visual Relationship Detection with Vision and Language Models
Figure 3 for Unified Visual Relationship Detection with Vision and Language Models
Figure 4 for Unified Visual Relationship Detection with Vision and Language Models
Viaarxiv icon

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Add code
Dec 04, 2022
Figure 1 for Improving Zero-shot Generalization and Robustness of Multi-modal Models
Figure 2 for Improving Zero-shot Generalization and Robustness of Multi-modal Models
Figure 3 for Improving Zero-shot Generalization and Robustness of Multi-modal Models
Figure 4 for Improving Zero-shot Generalization and Robustness of Multi-modal Models
Viaarxiv icon

MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models

Add code
Oct 04, 2022
Figure 1 for MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Figure 2 for MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Figure 3 for MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Figure 4 for MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Viaarxiv icon