Semi-grant-free non-orthogonal multiple access (semi-GF NOMA) has emerged as a promising technology for the fifth-generation new radio (5G-NR) networks supporting the coexistence of a large number of random connections with various quality of service requirements. However, implementing a semi-GF NOMA mechanism in 5G-NR networks with heterogeneous services has raised several resource management problems relating to unpredictable interference caused by the GF access strategy. To cope with this challenge, the paper develops a novel hybrid optimization and multi-agent deep (HOMAD) reinforcement learning-based resource allocation design to maximize the energy efficiency (EE) of semi-GF NOMA 5G-NR systems. In this design, a multi-agent deep Q network (MADQN) approach is employed to conduct the subchannel assignment (SA) among users. While optimization-based methods are utilized to optimize the transmission power for every SA setting. In addition, a full MADQN scheme conducting both SA and power allocation is also considered for comparison purposes. Simulation results show that the HOMAD approach outperforms other benchmarks significantly in terms of the convergence time and average EE.
This paper aims to jointly determine linear precoding (LP) vectors, beam hopping (BH), and discrete DVB-S2X transmission rates for the GEO satellite communication systems to minimize the payload power consumption and satisfy ground users' demands within a time window. Regarding constraint on the maximum number of illuminated beams per time slot, the technical requirement is formulated as a sparse optimization problem in which the hardware-related beam illumination energy is modeled in a sparsity form of the LP vectors. To cope with this problem, the compressed sensing method is employed to transform the sparsity parts into the quadratic form of precoders. Then, an iterative window-based algorithm is developed to update the LP vectors sequentially to an efficient solution. Additionally, two other two-phase frameworks are also proposed for comparison purposes. In the first phase, these methods aim to determine the MODCOD transmission schemes for users to meet their demands by using a heuristic approach or DNN tool. In the second phase, the LP vectors of each time slot will be optimized separately based on the determined MODCOD schemes.