Alert button
Picture for Zhihang Yuan

Zhihang Yuan

Alert button

LLM Inference Unveiled: Survey and Roofline Model Insights

Mar 11, 2024
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Viaarxiv icon

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

Feb 20, 2024
Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie

Viaarxiv icon

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Feb 13, 2024
Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan

Viaarxiv icon

MIM4DD: Mutual Information Maximization for Dataset Distillation

Dec 27, 2023
Yuzhang Shang, Zhihang Yuan, Yan Yan

Viaarxiv icon

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

Dec 17, 2023
Dawei Yang, Ning He, Xing Hu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang

Viaarxiv icon

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

Dec 10, 2023
Zhihang Yuan, Yuzhang Shang, Yue Song, Qiang Wu, Yan Yan, Guangyu Sun

Viaarxiv icon

PB-LLM: Partially Binarized Large Language Models

Sep 29, 2023
Yuzhang Shang, Zhihang Yuan, Qiang Wu, Zhen Dong

Viaarxiv icon