Alert button
Picture for Michael Andersch

Michael Andersch

Alert button

Reducing Activation Recomputation in Large Transformer Models

Add code
Bookmark button
Alert button
May 10, 2022
Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Reducing Activation Recomputation in Large Transformer Models
Figure 2 for Reducing Activation Recomputation in Large Transformer Models
Figure 3 for Reducing Activation Recomputation in Large Transformer Models
Figure 4 for Reducing Activation Recomputation in Large Transformer Models
Viaarxiv icon

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Add code
Bookmark button
Alert button
Apr 26, 2018
Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie

Figure 1 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 2 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 3 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 4 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Viaarxiv icon