Picture for Radu Soricut

Radu Soricut

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

Wavelet-Based Image Tokenizer for Vision Transformers

Add code
May 28, 2024
Figure 1 for Wavelet-Based Image Tokenizer for Vision Transformers
Figure 2 for Wavelet-Based Image Tokenizer for Vision Transformers
Viaarxiv icon

ImageInWords: Unlocking Hyper-Detailed Image Descriptions

Add code
May 05, 2024
Figure 1 for ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Figure 2 for ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Figure 3 for ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Figure 4 for ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning

Add code
Dec 19, 2023
Figure 1 for GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Figure 2 for GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Figure 3 for GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Figure 4 for GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Add code
Dec 01, 2023
Viaarxiv icon

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

Add code
Oct 18, 2023
Figure 1 for Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Figure 2 for Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Figure 3 for Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Figure 4 for Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Viaarxiv icon

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Add code
Oct 17, 2023
Figure 1 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 2 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 3 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 4 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Viaarxiv icon

CausalLM is not optimal for in-context learning

Add code
Sep 03, 2023
Figure 1 for CausalLM is not optimal for in-context learning
Figure 2 for CausalLM is not optimal for in-context learning
Figure 3 for CausalLM is not optimal for in-context learning
Figure 4 for CausalLM is not optimal for in-context learning
Viaarxiv icon