Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking any learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns to perform structured memory operations {ADD, UPDATE, DELETE, NOOP}, and an Answer Agent that selects the most relevant entries and reasons over them to produce an answer. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management and use with minimal supervision. With as few as 152 question-answer pairs and a corresponding temporal memory bank for training, Memory-R1 outperforms the most competitive existing baseline and demonstrates strong generalization across diverse question types and LLM backbones. Beyond presenting an effective approach, this work provides insights into how RL can unlock more agentic, memory-aware behaviors in LLMs, pointing toward richer, more persistent reasoning systems.
Abstract:It is common practice to use large computational resources to train neural networks, as is known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used for searching solutions for combinatorial optimization problems. In this paper, we propose to apply a hash function based distributed parallel Monte-Carlo Tree Search (MCTS) to a real-world problem of molecular design. By running our massively parallel MCTS combined with a simple RNN on 1024 CPU cores for 10 minutes, we achieved a score on a molecular design problem that significantly outperforms existing work. Whereas existing studies on massively scalable parallel MCTS only compare the number of rollouts, we prove the practicality of the algorithm by comparing the quality of the solutions obtained in practice. This method is generic and is expected to speed up other applications of MCTS.
Abstract:We introduce Bee$^+$, a 95-mg four-winged microrobot with improved controllability and open-loop-response characteristics with respect to those exhibited by state-of-the-art two-winged microrobots with the same size and similar weight (i.e., the 75-mg Harvard RoboBee and similar prototypes). The key innovation that made possible the development of Bee$^+$ is the introduction of an extremely light (28-mg) twinned unimorph actuator, which enabled the design of a new microrobotic mechanism that flaps four wings independently. A first main advantage of the proposed design, compared to two-winged RoboBee-like flyers, is that by increasing the number of actuators from two to four, the number of direct control inputs increases from three (roll-torque, pitch-torque and thrust-force) to four (roll-torque, pitch-torque, yaw-torque and thrust-force) when simple sinusoidal excitations are employed. A second advantage of Bee$^+$ is that its four-wing configuration and flapping mode naturally damped the rotational disturbances that commonly affect the yaw degree of freedom of two-winged microrobots. In addition, the design of Bee$^+$ greatly reduces the complexity of the associated fabrication process compared to those of other microrobots, as the unimorph actuators are fairly easy to build. Lastly, we hypothesize that given the relatively low wing-loading affecting their flapping mechanisms, the life expectancy of Bee$^+$s must be considerably higher than those of the two-winged counterparts. The functionality and basic capabilities of Bee$^+$ are demonstrated through a set of simple control experiments. We anticipate that this new platform will enable the implementation of high-performance controllers for the execution of high-speed aerobatic maneuvers at the sub-100-mg scale as well as diversifying the lines of research in the quest for achieving full autonomy at the sub-gram scale.