Alert button
Picture for Haihao Shen

Haihao Shen

Alert button

Efficient LLM Inference on CPUs

Add code
Bookmark button
Alert button
Nov 01, 2023
Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng

Viaarxiv icon

TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Add code
Bookmark button
Alert button
Oct 17, 2023
Wenhua Cheng, Yiyang Cai, Kaokao Lv, Haihao Shen

Viaarxiv icon

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Add code
Bookmark button
Alert button
Sep 28, 2023
Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv

Figure 1 for Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Figure 2 for Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Figure 3 for Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Figure 4 for Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Viaarxiv icon

Efficient Post-training Quantization with FP8 Formats

Add code
Bookmark button
Alert button
Sep 26, 2023
Haihao Shen, Naveen Mellempudi, Xin He, Qun Gao, Chang Wang, Mengni Wang

Figure 1 for Efficient Post-training Quantization with FP8 Formats
Figure 2 for Efficient Post-training Quantization with FP8 Formats
Figure 3 for Efficient Post-training Quantization with FP8 Formats
Figure 4 for Efficient Post-training Quantization with FP8 Formats
Viaarxiv icon

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Add code
Bookmark button
Alert button
Jun 28, 2023
Haihao Shen, Hengyu Meng, Bo Dong, Zhe Wang, Ofir Zafrir, Yi Ding, Yu Luo, Hanwen Chang, Qun Gao, Ziheng Wang, Guy Boudoukh, Moshe Wasserblat

Figure 1 for An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Figure 2 for An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Figure 3 for An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Figure 4 for An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Viaarxiv icon

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Add code
Bookmark button
Alert button
Oct 31, 2022
Shira Guskin, Moshe Wasserblat, Chang Wang, Haihao Shen

Figure 1 for QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Figure 2 for QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Figure 3 for QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Viaarxiv icon

Fast DistilBERT on CPUs

Add code
Bookmark button
Alert button
Oct 27, 2022
Haihao Shen, Ofir Zafrir, Bo Dong, Hengyu Meng, Xinyu Ye, Zhe Wang, Yi Ding, Hanwen Chang, Guy Boudoukh, Moshe Wasserblat

Figure 1 for Fast DistilBERT on CPUs
Figure 2 for Fast DistilBERT on CPUs
Figure 3 for Fast DistilBERT on CPUs
Figure 4 for Fast DistilBERT on CPUs
Viaarxiv icon

Prune Once for All: Sparse Pre-Trained Language Models

Add code
Bookmark button
Alert button
Nov 10, 2021
Ofir Zafrir, Ariel Larey, Guy Boudoukh, Haihao Shen, Moshe Wasserblat

Figure 1 for Prune Once for All: Sparse Pre-Trained Language Models
Figure 2 for Prune Once for All: Sparse Pre-Trained Language Models
Figure 3 for Prune Once for All: Sparse Pre-Trained Language Models
Figure 4 for Prune Once for All: Sparse Pre-Trained Language Models
Viaarxiv icon

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe

Add code
Bookmark button
Alert button
May 04, 2018
Jiong Gong, Haihao Shen, Guoming Zhang, Xiaoli Liu, Shane Li, Ge Jin, Niharika Maheshwari, Evarist Fomenko, Eden Segal

Figure 1 for Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe
Figure 2 for Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe
Figure 3 for Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe
Figure 4 for Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe
Viaarxiv icon