Alert button
Picture for Paolo Glorioso

Paolo Glorioso

Alert button

The Unreasonable Ineffectiveness of the Deeper Layers

Add code
Bookmark button
Alert button
Mar 26, 2024
Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

Figure 1 for The Unreasonable Ineffectiveness of the Deeper Layers
Figure 2 for The Unreasonable Ineffectiveness of the Deeper Layers
Figure 3 for The Unreasonable Ineffectiveness of the Deeper Layers
Figure 4 for The Unreasonable Ineffectiveness of the Deeper Layers
Viaarxiv icon

BlackMamba: Mixture of Experts for State-Space Models

Add code
Bookmark button
Alert button
Feb 01, 2024
Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge

Viaarxiv icon

Flatter, faster: scaling momentum for optimal speedup of SGD

Add code
Bookmark button
Alert button
Oct 28, 2022
Aditya Cowsik, Tankut Can, Paolo Glorioso

Figure 1 for Flatter, faster: scaling momentum for optimal speedup of SGD
Figure 2 for Flatter, faster: scaling momentum for optimal speedup of SGD
Figure 3 for Flatter, faster: scaling momentum for optimal speedup of SGD
Viaarxiv icon