Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zafir Stojanovski

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

May 30, 2025

Zafir Stojanovski, Oliver Stanley, Joe Sharratt, Richard Jones, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf

Figure 1 for REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Figure 2 for REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Figure 3 for REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Figure 4 for REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Abstract:We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and various common games. Its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed. This procedural generation approach allows for continuous evaluation across varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models.

* For code, see https://github.com/open-thought/reasoning-gym

Via

Access Paper or Ask Questions

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Nov 06, 2022

Zafir Stojanovski, Karsten Roth, Zeynep Akata

Figure 1 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Figure 2 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Figure 3 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Figure 4 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Abstract:Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.

* First Workshop on Interpolation Regularizers and Beyond, NeurIPS 2022 (Spotlight) and Workshop on Distribution Shifts, NeurIPS 2022

Via

Access Paper or Ask Questions