Picture for Matteo Pagliardini

Matteo Pagliardini

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

Add code
Sep 17, 2025
Figure 1 for Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Figure 2 for Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Figure 3 for Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Figure 4 for Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Viaarxiv icon

Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Add code
Feb 27, 2025
Viaarxiv icon

Leveraging the true depth of LLMs

Add code
Feb 05, 2025
Figure 1 for Leveraging the true depth of LLMs
Figure 2 for Leveraging the true depth of LLMs
Figure 3 for Leveraging the true depth of LLMs
Figure 4 for Leveraging the true depth of LLMs
Viaarxiv icon

The AdEMAMix Optimizer: Better, Faster, Older

Add code
Sep 05, 2024
Figure 1 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 2 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 3 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 4 for The AdEMAMix Optimizer: Better, Faster, Older
Viaarxiv icon

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Add code
Feb 04, 2024
Figure 1 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 2 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 3 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 4 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Add code
Nov 27, 2023
Figure 1 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 2 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 3 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 4 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Viaarxiv icon

DoGE: Domain Reweighting with Generalization Estimation

Add code
Oct 23, 2023
Viaarxiv icon

CoTFormer: More Tokens With Attention Make Up For Less Depth

Add code
Oct 16, 2023
Figure 1 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 2 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 3 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 4 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Viaarxiv icon

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

Add code
Jun 01, 2023
Figure 1 for Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Figure 2 for Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Figure 3 for Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Figure 4 for Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Viaarxiv icon

Revisiting the ACVI Method for Constrained Variational Inequalities

Add code
Oct 27, 2022
Figure 1 for Revisiting the ACVI Method for Constrained Variational Inequalities
Figure 2 for Revisiting the ACVI Method for Constrained Variational Inequalities
Figure 3 for Revisiting the ACVI Method for Constrained Variational Inequalities
Figure 4 for Revisiting the ACVI Method for Constrained Variational Inequalities
Viaarxiv icon