Abstract:In this paper, a reinforcement learning technique is employed to maximize the performance of a cognitive radio network (CRN). In the presence of primary users (PUs), it is presumed that two secondary users (SUs) access the licensed band within underlay mode. In addition, the SU transmitter is assumed to be an energy-constrained device that requires harvesting energy in order to transmit signals to their intended destination. Therefore, we propose that there are two main sources of energy; the interference of PUs' transmissions and ambient radio frequency (RF) sources. The SU will select whether to gather energy from PUs or only from ambient sources based on a predetermined threshold. The process of energy harvesting from the PUs' messages is accomplished via the time switching approach. In addition, based on a deep Q-network (DQN) approach, the SU transmitter determines whether to collect energy or transmit messages during each time slot as well as selects the suitable transmission power in order to maximize its average data rate. Our approach outperforms a baseline strategy and converges, as shown by our findings.
Abstract:In this paper, we propose a non-orthogonal multiple access (NOMA)-based communication framework that allows machine type devices (MTDs) to access the network while avoiding congestion. The proposed technique is a 2-step mechanism that first employs fast uplink grant to schedule the devices without sending a request to the base station (BS). Secondly, NOMA pairing is employed in a distributed manner to reduce signaling overhead. Due to the limited capability of information gathering at the BS in massive scenarios, learning techniques are best fit for such problems. Therefore, multi-arm bandit learning is adopted to schedule the fast grant MTDs. Then, constrained random NOMA pairing is proposed that assists in decoupling the two main challenges of fast uplink grant schemes namely, active set prediction and optimal scheduling. Using NOMA, we were able to significantly reduce the resource wastage due to prediction errors. Additionally, the results show that the proposed scheme can easily attain the impractical optimal OMA performance, in terms of the achievable rewards, at an affordable complexity.