Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Byron Xu

Dion: A Communication-Efficient Optimizer for Large Models

Apr 07, 2025

Kwangjun Ahn, Byron Xu

Abstract:Training large AI models efficiently requires distributing computation across multiple accelerators, but this often incurs significant communication overhead -- especially during gradient synchronization. We introduce Dion, a communication-efficient optimizer that retains the synchronous semantics of standard distributed training (e.g., DDP, FSDP) while substantially reducing I/O costs. Unlike conventional optimizers that synchronize full gradient matrices, Dion leverages orthonormalized updates with device-local momentum buffers, eliminating the need for full gradient exchange. It further supports an efficient sharding strategy that avoids reconstructing large matrices during training.

* technical report; comments welcome!

Via

Access Paper or Ask Questions

PcLast: Discovering Plannable Continuous Latent States

Nov 06, 2023

Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni(+4 more)

Figure 1 for PcLast: Discovering Plannable Continuous Latent States

Figure 2 for PcLast: Discovering Plannable Continuous Latent States

Figure 3 for PcLast: Discovering Plannable Continuous Latent States

Figure 4 for PcLast: Discovering Plannable Continuous Latent States

Abstract:Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations. While compact latent representations, typically learned from variational autoencoders or inverse dynamics, enable goal-conditioned planning they ignore state affordances, thus hampering their sample-efficient planning capabilities. In this paper, we learn a representation that associates reachable states together for effective onward planning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information); and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based and reward-free settings show significant improvements in sampling efficiency, and yields layered state abstractions that enable computationally efficient hierarchical planning.

* Pre-Print

Via

Access Paper or Ask Questions