Picture for Aniruddha Nrusimha

Aniruddha Nrusimha

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Add code
May 21, 2024
Figure 1 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Figure 2 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Figure 3 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Figure 4 for Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Viaarxiv icon

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Add code
Apr 04, 2024
Figure 1 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 2 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 3 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Figure 4 for Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Viaarxiv icon

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

Add code
Feb 07, 2024
Viaarxiv icon

Towards Verifiable Text Generation with Symbolic References

Add code
Nov 15, 2023
Figure 1 for Towards Verifiable Text Generation with Symbolic References
Figure 2 for Towards Verifiable Text Generation with Symbolic References
Figure 3 for Towards Verifiable Text Generation with Symbolic References
Figure 4 for Towards Verifiable Text Generation with Symbolic References
Viaarxiv icon

Striped Attention: Faster Ring Attention for Causal Transformers

Add code
Nov 15, 2023
Viaarxiv icon

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Add code
Oct 07, 2019
Figure 1 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Figure 2 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Figure 3 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Figure 4 for Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Viaarxiv icon