Abstract:Training modern deep learning models is increasingly constrained by GPU memory and compute limits. While Randomized Numerical Linear Algebra (RandNLA) offers proven techniques to compress these models, the lack of a unified, production-grade library prevents widely adopting these methods. We present Panther, a PyTorch-compatible library that consolidates established RandNLA algorithms into a single high-performance framework. Panther engineers efficient, drop-in replacements for standard components including sketched linear layers, 2D convolution, multi-head attention, and randomized matrix decompositions (such as pivoted CholeskyQR). By implementing a custom C++/CUDA backend (pawX), Panther provides an optimized implementation that can run on both CPUs and GPUs. We demonstrate the effectiveness of RandNLA techniques and Panther's ease of adoption. By replacing standard PyTorch linear layers with Panther layers (requiring only a few lines of code) we achieve significant memory savings (up to 75%) on BERT while maintaining comparable loss. Source code is available (MIT License) at https://github.com/FahdSeddik/panther, along with demonstration video at https://youtu.be/7M3RQb4KWxs.




Abstract:A procedural level generator is a tool that generates levels from noise. One approach to build generators is using machine learning, but given the training data rarity, multiple methods have been proposed to train generators from nothing. However, level generation tasks tend to have sparse feedback, which is commonly mitigated using game-specific supplemental rewards. This paper proposes a novel approach to train generators from nothing by learning at multiple level sizes starting from a small size up to the desired sizes. This approach employs the observed phenomenon that feedback is denser at smaller sizes to avoid supplemental rewards. It also presents the benefit of training generators to output levels at various sizes. We apply this approach to train controllable generators using generative flow networks. We also modify diversity sampling to be compatible with generative flow networks and to expand the expressive range. The results show that our methods can generate high-quality diverse levels for Sokoban, Zelda and Danger Dave for a variety of sizes, after only 3h 29min up to 6h 11min (depending on the game) of training on a single commodity machine. Also, the results show that our generators can output levels for sizes that were unavailable during training.