Abstract:Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. A main bottleneck is the resulting gradient aggregation overhead. While gradient compression and sparse collective communication techniques are commonly employed to alleviate network load, many gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy. This paper introduces PacTrain, a novel framework that accelerates distributed training by combining pruning with sparse gradient compression. Active pruning of the neural network makes the model weights and gradients sparse. By ensuring the global knowledge of the gradient sparsity among all distributed training workers, we can perform lightweight compression communication without harming accuracy. We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the all-reduce primitive. Experimental evaluations show that PacTrain improves training throughput by 1.25 to 8.72 times compared to state-of-the-art compression-enabled systems for representative vision and language models training tasks under bandwidth-constrained conditions.
Abstract:Connecting long-range wireless networks to the Internet imposes challenges due to vastly longer round-trip-times (RTTs). In this paper, we present an ICN protocol framework that enables robust and efficient delay-tolerant communication to edge networks. Our approach provides ICN-idiomatic communication between networks with vastly different RTTs. We applied this framework to LoRa, enabling end-to-end consumer-to-LoRa-producer interaction over an ICN-Internet and asynchronous data production in the LoRa edge. Instead of using LoRaWAN, we implemented an IEEE 802.15.4e DSME MAC layer on top of the LoRa PHY and ICN protocol mechanisms in RIOT OS. Executed on off-the-shelf IoT hardware, we provide a comparative evaluation for basic NDN-style ICN [60], RICE [31]-like pulling, and reflexive forwarding [46]. This is the first practical evaluation of ICN over LoRa using a reliable MAC. Our results show that periodic polling in NDN works inefficiently when facing long and differing RTTs. RICE reduces polling overhead and exploits gateway knowledge, without violating ICN principles. Reflexive forwarding reflects sporadic data generation naturally. Combined with a local data push, it operates efficiently and enables lifetimes of >1 year for battery powered LoRa-ICN nodes.