Abstract:Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM, which demonstrate significant improvements in sample efficiency and cumulative return compared to their baseline counterparts.




Abstract:The evolution of 5G New Radio (NR) has brought significant improvements in signal strength and service quality for users. By integrating Multiple Input Multiple Output (MIMO) systems into communications, multiple data streams can be transmitted simultaneously across multiple antennas. Additionally, the incorporation of precoding in MIMO systems enables enhanced data rates and spectral efficiency. In wireless networks, precoders are used to steer high-gain beams intended for specific users. This paper focuses on the implementation of 16, 32, and 64 channel linear precoders in the Remote Radio Head (RRH) of the indigenously developed 5G testbed at IIT Madras. These precoders include a memory module to store channel matrices and a multiplier module to perform matrix multiplications between the channel matrices and user data within a slot duration of 500 microseconds. The system demonstrates DSP utilization levels of 9.75%, 19.5%, and 39% for (16 x 8), (32 x 8), and (64 x 8) antenna-layer configurations, respectively, while maintaining Block RAM (BRAM) usage within 2.28%, 3.91%, and 7.16%. Additionally, a throughput of 1.2 Gbps with four active layers highlights the system's optimized performance under hardware constraints.