Alert button

FlashDecoding++: Faster Large Language Model Inference on GPUs

Nov 10, 2023
Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong, Yu Wang

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: