Picture for Amirkeivan Mohtashami

Amirkeivan Mohtashami

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Add code
Mar 30, 2024
Viaarxiv icon

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Add code
Feb 04, 2024
Figure 1 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 2 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 3 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Figure 4 for DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Viaarxiv icon

Social Learning: Towards Collaborative Learning with Large Language Models

Add code
Dec 18, 2023
Figure 1 for Social Learning: Towards Collaborative Learning with Large Language Models
Figure 2 for Social Learning: Towards Collaborative Learning with Large Language Models
Figure 3 for Social Learning: Towards Collaborative Learning with Large Language Models
Figure 4 for Social Learning: Towards Collaborative Learning with Large Language Models
Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Add code
Nov 27, 2023
Figure 1 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 2 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 3 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Figure 4 for MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Viaarxiv icon

CoTFormer: More Tokens With Attention Make Up For Less Depth

Add code
Oct 16, 2023
Figure 1 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 2 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Figure 3 for CoTFormer: More Tokens With Attention Make Up For Less Depth
Viaarxiv icon

Landmark Attention: Random-Access Infinite Context Length for Transformers

Add code
May 25, 2023
Figure 1 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 2 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 3 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 4 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Viaarxiv icon

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Add code
Feb 07, 2023
Figure 1 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 2 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 3 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 4 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Viaarxiv icon

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates

Add code
May 30, 2022
Figure 1 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 2 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 3 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 4 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Viaarxiv icon

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

Add code
Feb 03, 2022
Figure 1 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Figure 2 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Figure 3 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Viaarxiv icon

Simultaneous Training of Partially Masked Neural Networks

Add code
Jun 16, 2021
Figure 1 for Simultaneous Training of Partially Masked Neural Networks
Figure 2 for Simultaneous Training of Partially Masked Neural Networks
Figure 3 for Simultaneous Training of Partially Masked Neural Networks
Figure 4 for Simultaneous Training of Partially Masked Neural Networks
Viaarxiv icon