Picture for Philipp Krähenbühl

Philipp Krähenbühl

Image and Video Tokenization with Binary Spherical Quantization

Add code
Jun 11, 2024
Viaarxiv icon

Language-Image Models with 3D Understanding

Add code
May 06, 2024
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

Language-conditioned Detection Transformer

Add code
Nov 29, 2023
Viaarxiv icon

Training a Large Video Model on a Single Machine in a Day

Add code
Sep 28, 2023
Figure 1 for Training a Large Video Model on a Single Machine in a Day
Figure 2 for Training a Large Video Model on a Single Machine in a Day
Figure 3 for Training a Large Video Model on a Single Machine in a Day
Figure 4 for Training a Large Video Model on a Single Machine in a Day
Viaarxiv icon

Long-tail Detection with Effective Class-Margins

Add code
Jan 23, 2023
Viaarxiv icon

NMS Strikes Back

Add code
Dec 12, 2022
Figure 1 for NMS Strikes Back
Figure 2 for NMS Strikes Back
Figure 3 for NMS Strikes Back
Figure 4 for NMS Strikes Back
Viaarxiv icon

Learning Video Representations from Large Language Models

Add code
Dec 08, 2022
Figure 1 for Learning Video Representations from Large Language Models
Figure 2 for Learning Video Representations from Large Language Models
Figure 3 for Learning Video Representations from Large Language Models
Figure 4 for Learning Video Representations from Large Language Models
Viaarxiv icon

Real-time Online Video Detection with Temporal Smoothing Transformers

Add code
Sep 19, 2022
Figure 1 for Real-time Online Video Detection with Temporal Smoothing Transformers
Figure 2 for Real-time Online Video Detection with Temporal Smoothing Transformers
Figure 3 for Real-time Online Video Detection with Temporal Smoothing Transformers
Figure 4 for Real-time Online Video Detection with Temporal Smoothing Transformers
Viaarxiv icon

Cross-view Transformers for real-time Map-view Semantic Segmentation

Add code
May 05, 2022
Figure 1 for Cross-view Transformers for real-time Map-view Semantic Segmentation
Figure 2 for Cross-view Transformers for real-time Map-view Semantic Segmentation
Figure 3 for Cross-view Transformers for real-time Map-view Semantic Segmentation
Figure 4 for Cross-view Transformers for real-time Map-view Semantic Segmentation
Viaarxiv icon