Alert button
Picture for Amirkeivan Mohtashami

Amirkeivan Mohtashami

Alert button

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Feb 04, 2024
Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

Viaarxiv icon

Social Learning: Towards Collaborative Learning with Large Language Models

Dec 18, 2023
Amirkeivan Mohtashami, Florian Hartmann, Sian Gooding, Lukas Zilka, Matt Sharifi, Blaise Aguera y Arcas

Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Nov 27, 2023
Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

Viaarxiv icon

CoTFormer: More Tokens With Attention Make Up For Less Depth

Oct 16, 2023
Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

Viaarxiv icon

Landmark Attention: Random-Access Infinite Context Length for Transformers

May 25, 2023
Amirkeivan Mohtashami, Martin Jaggi

Figure 1 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 2 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 3 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 4 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Viaarxiv icon

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Feb 07, 2023
Amirkeivan Mohtashami, Mauro Verzetti, Paul K. Rubenstein

Figure 1 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 2 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 3 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 4 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Viaarxiv icon

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates

May 30, 2022
Amirkeivan Mohtashami, Martin Jaggi, Sebastian Stich

Figure 1 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 2 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 3 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 4 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Viaarxiv icon

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

Feb 03, 2022
Amirkeivan Mohtashami, Sebastian Stich, Martin Jaggi

Figure 1 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Figure 2 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Figure 3 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Viaarxiv icon

Simultaneous Training of Partially Masked Neural Networks

Jun 16, 2021
Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

Figure 1 for Simultaneous Training of Partially Masked Neural Networks
Figure 2 for Simultaneous Training of Partially Masked Neural Networks
Figure 3 for Simultaneous Training of Partially Masked Neural Networks
Figure 4 for Simultaneous Training of Partially Masked Neural Networks
Viaarxiv icon

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

Mar 03, 2021
Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

Figure 1 for Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
Figure 2 for Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
Figure 3 for Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
Figure 4 for Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
Viaarxiv icon