Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

Apr 28, 2026

Ce Zheng, Xinghan Wang, Jiahong Ning, Yuxuan Shi, Ning Huang, Tingting Yang

Share this with someone who'll enjoy it:

Abstract:Federated inference enhances LLM performance in edge computing through weighted averaging of distributed model predictions. However, autoregressive LLM inference requires frequent full-model forward passes across workers, severely limiting decoding throughput. Distributed deployment further aggravates this due to a communication bottleneck: each worker must transmit full token probability distributions per draft token, dominating end-to-end latency. To address these challenges, we introduce speculative decoding to enable parallel LLM processing and propose a top-K compressed transmission scheme with two server-side reconstruction strategies. We theoretically analyze the robustness of our method in terms of local reconstruction error, aggregation bias, and acceptance-rate bias, and derive corresponding bounds. Experiments demonstrate that our scheme achieves high generation fidelity while significantly reducing communication overhead.

* IEEE International Symposium on Information Theory (ISIT), 2026

View paper on

Share this with someone who'll enjoy it:

Title:SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

Paper and Code