Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dimitrios Nikolopoulos

SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Jun 11, 2025

Xiangchen Li, Dimitrios Spatharakis, Saeid Ghafouri, Jiakun Fan, Dimitrios Nikolopoulos

Abstract:Regardless the advancements in device capabilities, efficient inferencing advanced large language models (LLMs) at the edge remains challenging due to limited device memory and power constraints. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new approach that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose SLED, a method that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server efficiently batches and verifies the tokens utilizing a more precise target model. This approach supports device heterogeneity and reduces server-side memory footprint by avoiding the need to deploy multiple target models. Our initial experiments with Jetson Orin Nano, Raspberry Pi 5, and an RTX 6000 edge server indicate substantial benefits: significantly reduced latency, improved energy efficiency, and increased concurrent inference sessions, all without sacrificing model accuracy.

* 6 pages, 9 figures, 2 tables

Via

Access Paper or Ask Questions

Incremental Training of Deep Convolutional Neural Networks

Mar 27, 2018

Roxana Istrate, Adelmo Cristiano Innocenza Malossi, Costas Bekas, Dimitrios Nikolopoulos

Figure 1 for Incremental Training of Deep Convolutional Neural Networks

Figure 2 for Incremental Training of Deep Convolutional Neural Networks

Figure 3 for Incremental Training of Deep Convolutional Neural Networks

Figure 4 for Incremental Training of Deep Convolutional Neural Networks

Abstract:We propose an incremental training method that partitions the original network into sub-networks, which are then gradually incorporated in the running network during the training process. To allow for a smooth dynamic growth of the network, we introduce a look-ahead initialization that outperforms the random initialization. We demonstrate that our incremental approach reaches the reference network baseline accuracy. Additionally, it allows to identify smaller partitions of the original state-of-the-art network, that deliver the same final accuracy, by using only a fraction of the global number of parameters. This allows for a potential speedup of the training time of several factors. We report training results on CIFAR-10 for ResNet and VGGNet.

* http://ceur-ws.org/Vol-1998

Via

Access Paper or Ask Questions