Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arthur Lebeurrier

OCKHAM

A multilevel approach to accelerate the training of Transformers

Apr 24, 2025

Guillaume Lauga, Maël Chaumette, Edgar Desainte-Maréville, Étienne Lasalle, Arthur Lebeurrier

Abstract:In this article, we investigate the potential of multilevel approaches to accelerate the training of transformer architectures. Using an ordinary differential equation (ODE) interpretation of these architectures, we propose an appropriate way of varying the discretization of these ODE Transformers in order to accelerate the training. We validate our approach experimentally by a comparison with the standard training procedure.

Via

Access Paper or Ask Questions