Picture for Behnam Neyshabur

Behnam Neyshabur

Shammie

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Add code
Dec 22, 2023
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Convexifying Transformers: Improving optimization and understanding of transformer networks

Add code
Nov 20, 2022
Figure 1 for Convexifying Transformers: Improving optimization and understanding of transformer networks
Figure 2 for Convexifying Transformers: Improving optimization and understanding of transformer networks
Figure 3 for Convexifying Transformers: Improving optimization and understanding of transformer networks
Figure 4 for Convexifying Transformers: Improving optimization and understanding of transformer networks
Viaarxiv icon

Layer-Stack Temperature Scaling

Add code
Nov 18, 2022
Figure 1 for Layer-Stack Temperature Scaling
Figure 2 for Layer-Stack Temperature Scaling
Figure 3 for Layer-Stack Temperature Scaling
Figure 4 for Layer-Stack Temperature Scaling
Viaarxiv icon

Teaching Algorithmic Reasoning via In-context Learning

Add code
Nov 15, 2022
Viaarxiv icon

REPAIR: REnormalizing Permuted Activations for Interpolation Repair

Add code
Nov 15, 2022
Viaarxiv icon

Revisiting Neural Scaling Laws in Language and Vision

Add code
Sep 13, 2022
Figure 1 for Revisiting Neural Scaling Laws in Language and Vision
Figure 2 for Revisiting Neural Scaling Laws in Language and Vision
Figure 3 for Revisiting Neural Scaling Laws in Language and Vision
Figure 4 for Revisiting Neural Scaling Laws in Language and Vision
Viaarxiv icon

Exploring Length Generalization in Large Language Models

Add code
Jul 11, 2022
Figure 1 for Exploring Length Generalization in Large Language Models
Figure 2 for Exploring Length Generalization in Large Language Models
Figure 3 for Exploring Length Generalization in Large Language Models
Figure 4 for Exploring Length Generalization in Large Language Models
Viaarxiv icon