Picture for Justin Gilmer

Justin Gilmer

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Add code
Dec 11, 2023
Viaarxiv icon

Small-scale proxies for large-scale Transformer training instabilities

Add code
Sep 25, 2023
Figure 1 for Small-scale proxies for large-scale Transformer training instabilities
Figure 2 for Small-scale proxies for large-scale Transformer training instabilities
Figure 3 for Small-scale proxies for large-scale Transformer training instabilities
Figure 4 for Small-scale proxies for large-scale Transformer training instabilities
Viaarxiv icon

Replacing softmax with ReLU in Vision Transformers

Add code
Sep 15, 2023
Figure 1 for Replacing softmax with ReLU in Vision Transformers
Figure 2 for Replacing softmax with ReLU in Vision Transformers
Figure 3 for Replacing softmax with ReLU in Vision Transformers
Figure 4 for Replacing softmax with ReLU in Vision Transformers
Viaarxiv icon

Benchmarking Neural Network Training Algorithms

Add code
Jun 12, 2023
Figure 1 for Benchmarking Neural Network Training Algorithms
Figure 2 for Benchmarking Neural Network Training Algorithms
Figure 3 for Benchmarking Neural Network Training Algorithms
Figure 4 for Benchmarking Neural Network Training Algorithms
Viaarxiv icon

Improving Training Stability for Multitask Ranking Models in Recommender Systems

Add code
Feb 17, 2023
Figure 1 for Improving Training Stability for Multitask Ranking Models in Recommender Systems
Figure 2 for Improving Training Stability for Multitask Ranking Models in Recommender Systems
Figure 3 for Improving Training Stability for Multitask Ranking Models in Recommender Systems
Figure 4 for Improving Training Stability for Multitask Ranking Models in Recommender Systems
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

Do Current Multi-Task Optimization Methods in Deep Learning Even Help?

Add code
Sep 23, 2022
Figure 1 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 2 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 3 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 4 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Viaarxiv icon

Adaptive Gradient Methods at the Edge of Stability

Add code
Jul 29, 2022
Figure 1 for Adaptive Gradient Methods at the Edge of Stability
Figure 2 for Adaptive Gradient Methods at the Edge of Stability
Figure 3 for Adaptive Gradient Methods at the Edge of Stability
Figure 4 for Adaptive Gradient Methods at the Edge of Stability
Viaarxiv icon