Picture for Paul Voigtlaender

Paul Voigtlaender

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

Text Prompting for Multi-Concept Video Customization by Autoregressive Generation

Add code
May 22, 2024
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Point-VOS: Pointing Up Video Object Segmentation

Add code
Feb 08, 2024
Viaarxiv icon

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Add code
Oct 17, 2023
Figure 1 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 2 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 3 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 4 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Viaarxiv icon

StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

Add code
Aug 22, 2023
Figure 1 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Figure 2 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Figure 3 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Figure 4 for StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Viaarxiv icon

Connecting Vision and Language with Video Localized Narratives

Add code
Mar 15, 2023
Figure 1 for Connecting Vision and Language with Video Localized Narratives
Figure 2 for Connecting Vision and Language with Video Localized Narratives
Figure 3 for Connecting Vision and Language with Video Localized Narratives
Figure 4 for Connecting Vision and Language with Video Localized Narratives
Viaarxiv icon

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

Add code
Sep 25, 2022
Figure 1 for BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
Figure 2 for BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
Figure 3 for BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
Figure 4 for BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
Viaarxiv icon

STEP: Segmenting and Tracking Every Pixel

Add code
Feb 23, 2021
Figure 1 for STEP: Segmenting and Tracking Every Pixel
Figure 2 for STEP: Segmenting and Tracking Every Pixel
Figure 3 for STEP: Segmenting and Tracking Every Pixel
Figure 4 for STEP: Segmenting and Tracking Every Pixel
Viaarxiv icon

Reducing the Annotation Effort for Video Object Segmentation Datasets

Add code
Nov 02, 2020
Figure 1 for Reducing the Annotation Effort for Video Object Segmentation Datasets
Figure 2 for Reducing the Annotation Effort for Video Object Segmentation Datasets
Figure 3 for Reducing the Annotation Effort for Video Object Segmentation Datasets
Figure 4 for Reducing the Annotation Effort for Video Object Segmentation Datasets
Viaarxiv icon