Alert button
Picture for Zhanpeng Zeng

Zhanpeng Zeng

Alert button

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

Add code
Bookmark button
Alert button
Mar 12, 2024
Zhanpeng Zeng, Karthikeyan Sankaralingam, Vikas Singh

Figure 1 for IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
Figure 2 for IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
Figure 3 for IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
Figure 4 for IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
Viaarxiv icon

LookupFFN: Making Transformers Compute-lite for CPU inference

Add code
Bookmark button
Alert button
Mar 12, 2024
Zhanpeng Zeng, Michael Davies, Pranav Pulijala, Karthikeyan Sankaralingam, Vikas Singh

Figure 1 for LookupFFN: Making Transformers Compute-lite for CPU inference
Figure 2 for LookupFFN: Making Transformers Compute-lite for CPU inference
Figure 3 for LookupFFN: Making Transformers Compute-lite for CPU inference
Figure 4 for LookupFFN: Making Transformers Compute-lite for CPU inference
Viaarxiv icon

FrameQuant: Flexible Low-Bit Quantization for Transformers

Add code
Bookmark button
Alert button
Mar 10, 2024
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang, Vikas Singh

Figure 1 for FrameQuant: Flexible Low-Bit Quantization for Transformers
Figure 2 for FrameQuant: Flexible Low-Bit Quantization for Transformers
Figure 3 for FrameQuant: Flexible Low-Bit Quantization for Transformers
Figure 4 for FrameQuant: Flexible Low-Bit Quantization for Transformers
Viaarxiv icon

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Add code
Bookmark button
Alert button
May 07, 2023
Zhanpeng Zeng, Cole Hawkins, Mingyi Hong, Aston Zhang, Nikolaos Pappas, Vikas Singh, Shuai Zheng

Figure 1 for Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Figure 2 for Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Figure 3 for Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Figure 4 for Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Viaarxiv icon

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Add code
Bookmark button
Alert button
Jul 21, 2022
Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh

Figure 1 for Multi Resolution Analysis (MRA) for Approximate Self-Attention
Figure 2 for Multi Resolution Analysis (MRA) for Approximate Self-Attention
Figure 3 for Multi Resolution Analysis (MRA) for Approximate Self-Attention
Figure 4 for Multi Resolution Analysis (MRA) for Approximate Self-Attention
Viaarxiv icon

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Add code
Bookmark button
Alert button
Nov 18, 2021
Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh

Figure 1 for You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
Figure 2 for You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
Figure 3 for You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
Figure 4 for You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
Viaarxiv icon

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

Add code
Bookmark button
Alert button
Mar 05, 2021
Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh

Figure 1 for Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Figure 2 for Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Figure 3 for Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Figure 4 for Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Viaarxiv icon