Picture for Nathan Godey

Nathan Godey

Gaperon: A Peppered English-French Generative Language Model Suite

Add code
Oct 29, 2025
Viaarxiv icon

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression

Add code
Mar 04, 2025
Viaarxiv icon

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Add code
Apr 11, 2024
Viaarxiv icon

On the Scaling Laws of Geographical Representation in Language Models

Add code
Mar 04, 2024
Figure 1 for On the Scaling Laws of Geographical Representation in Language Models
Figure 2 for On the Scaling Laws of Geographical Representation in Language Models
Figure 3 for On the Scaling Laws of Geographical Representation in Language Models
Figure 4 for On the Scaling Laws of Geographical Representation in Language Models
Viaarxiv icon

Anisotropy Is Inherent to Self-Attention in Transformers

Add code
Jan 24, 2024
Figure 1 for Anisotropy Is Inherent to Self-Attention in Transformers
Figure 2 for Anisotropy Is Inherent to Self-Attention in Transformers
Figure 3 for Anisotropy Is Inherent to Self-Attention in Transformers
Figure 4 for Anisotropy Is Inherent to Self-Attention in Transformers
Viaarxiv icon

Headless Language Models: Learning without Predicting with Contrastive Weight Tying

Add code
Sep 15, 2023
Figure 1 for Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Figure 2 for Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Figure 3 for Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Figure 4 for Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Viaarxiv icon

Is Anisotropy Inherent to Transformers?

Add code
Jun 13, 2023
Viaarxiv icon

MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling

Add code
Dec 14, 2022
Viaarxiv icon