Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John Donaghy

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Sep 10, 2025

Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies(+5 more)

Abstract:Post-training language models (LMs) with reinforcement learning (RL) can enhance their complex reasoning capabilities without supervised fine-tuning, as demonstrated by DeepSeek-R1-Zero. However, effectively utilizing RL for LMs requires significant parallelization to scale-up inference, which introduces non-trivial technical challenges (e.g. latency, memory, and reliability) alongside ever-growing financial costs. We present Swarm sAmpling Policy Optimization (SAPO), a fully decentralized and asynchronous RL post-training algorithm. SAPO is designed for decentralized networks of heterogenous compute nodes, where each node manages its own policy model(s) while "sharing" rollouts with others in the network; no explicit assumptions about latency, model homogeneity, or hardware are required and nodes can operate in silo if desired. As a result, the algorithm avoids common bottlenecks in scaling RL post-training while also allowing (and even encouraging) new possibilities. By sampling rollouts "shared" across the network, it enables "Aha moments" to propagate, thereby bootstrapping the learning process. In this paper we show SAPO achieved cumulative reward gains of up to 94% in controlled experiments. We also share insights from tests on a network with thousands of nodes contributed by Gensyn community members running the algorithm on diverse hardware and models during an open-source demo.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Jun 12, 2025

Jari Kolehmainen, Nikolay Blagoev, John Donaghy, Oğuzhan Ersoy, Christopher Nies

Abstract:Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive, avoiding the need for a highly connected compute cluster. These state-of-the-art low communication training methods still employ a synchronization step for model parameters, which, when performed over all model replicas, can become costly on a low-bandwidth network. In this work, we propose a novel optimization method, NoLoCo, that does not explicitly synchronize all model parameters during training and, as a result, does not require any collective communication. NoLoCo implicitly synchronizes model weights via a novel variant of the Nesterov momentum optimizer by partially averaging model weights with a randomly selected other one. We provide both a theoretical convergence analysis for our proposed optimizer as well as empirical results from language model training. We benchmark NoLoCo on a wide range of accelerator counts and model sizes, between 125M to 6.8B parameters. Our method requires significantly less communication overhead than fully sharded data parallel training or even widely used low communication training method, DiLoCo. The synchronization step itself is estimated to be one magnitude faster than the all-reduce used in DiLoCo for few hundred accelerators training over the internet. We also do not have any global blocking communication that reduces accelerator idling time. Compared to DiLoCo, we also observe up to $4\%$ faster convergence rate with wide range of model sizes and accelerator counts.

Via

Access Paper or Ask Questions

Inference and De-Noising of Non-Gaussian Particle Distribution Functions: A Generative Modeling Approach

Oct 05, 2021

John Donaghy, Kai Germaschewski

Figure 1 for Inference and De-Noising of Non-Gaussian Particle Distribution Functions: A Generative Modeling Approach

Figure 2 for Inference and De-Noising of Non-Gaussian Particle Distribution Functions: A Generative Modeling Approach

Figure 3 for Inference and De-Noising of Non-Gaussian Particle Distribution Functions: A Generative Modeling Approach

Figure 4 for Inference and De-Noising of Non-Gaussian Particle Distribution Functions: A Generative Modeling Approach

Abstract:The particle-in-cell numerical method of plasma physics balances a trade-off between computational cost and intrinsic noise. Inference on data produced by these simulations generally consists of binning the data to recover the particle distribution function, from which physical processes may be investigated. In addition to containing noise, the distribution function is temporally dynamic and can be non-gaussian and multi-modal, making the task of modeling it difficult. Here we demonstrate the use of normalizing flows to learn a smooth, tractable approximation to the noisy particle distribution function. We demonstrate that the resulting data driven likelihood conserves relevant physics and may be extended to encapsulate the temporal evolution of the distribution function.

Via

Access Paper or Ask Questions