Picture for Mario Lučić

Mario Lučić

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration

Add code
Dec 12, 2023
Viaarxiv icon

Video OWL-ViT: Temporally-consistent open-world localization in video

Add code
Aug 22, 2023
Figure 1 for Video OWL-ViT: Temporally-consistent open-world localization in video
Figure 2 for Video OWL-ViT: Temporally-consistent open-world localization in video
Figure 3 for Video OWL-ViT: Temporally-consistent open-world localization in video
Figure 4 for Video OWL-ViT: Temporally-consistent open-world localization in video
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Jul 12, 2023
Figure 1 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 2 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 3 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 4 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Viaarxiv icon

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Add code
Apr 24, 2023
Figure 1 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 2 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 3 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Figure 4 for End-to-End Spatio-Temporal Action Localisation with Video Transformers
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

Beyond Transfer Learning: Co-finetuning for Action Localisation

Add code
Jul 08, 2022
Figure 1 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 2 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 3 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 4 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Viaarxiv icon

Object Scene Representation Transformer

Add code
Jun 14, 2022
Figure 1 for Object Scene Representation Transformer
Figure 2 for Object Scene Representation Transformer
Figure 3 for Object Scene Representation Transformer
Figure 4 for Object Scene Representation Transformer
Viaarxiv icon

ViViT: A Video Vision Transformer

Add code
Mar 29, 2021
Figure 1 for ViViT: A Video Vision Transformer
Figure 2 for ViViT: A Video Vision Transformer
Figure 3 for ViViT: A Video Vision Transformer
Figure 4 for ViViT: A Video Vision Transformer
Viaarxiv icon