In this paper, we consider UAVs equipped with a VLC access point and coordinated multipoint (CoMP) capability that allows users to connect to more than one UAV. UAVs can move in 3-dimensional (3D) at a constant acceleration in each master timescale, where a central server is responsible for synchronization and cooperation among UAVs. We define the data rate for each user type, CoMP and non-CoMP. The constant speed in UAVs' motion is not practical, and the effect of acceleration on the movement of UAVs is necessary to be considered. Unlike most existing works, we see the effect of variable speed on kinetic and allocation formulas. For the proposed system model, we define timescales for two different slots in which resources are allocated. In the master timescale, the acceleration of each UAV is specified, and in each short timescale, radio resources are allocated. The initial velocity in each small time slot is obtained from the previous time slot's velocity. Our goal is to formulate a multiobjective optimization problem where the total data rate is maximized and the total communication power consumption is minimized simultaneously. To deal this multiobjective optimization, we first apply the weighted method and then apply multi-agent deep deterministic policy gradient (MADDPG) which is a multi-agent method based on deep deterministic policy gradient (DDPG) that ensures more stable and faster convergence. We improve this solution method by adding two critic networks as well as allocating the two step acceleration. Simulation results indicate that the constant acceleration motion of UAVs gives about 8\% better results than conventional motion systems in terms of performance. Furthermore, CoMP supports the system to achieve an average of about 12\% higher rates comparing with non-CoMP system.
In this paper, we propose a joint radio and core resource allocation framework for NFV-enabled networks. In the proposed system model, the goal is to maximize energy efficiency (EE), by guaranteeing end-to-end (E2E) quality of service (QoS) for different service types. To this end, we formulate an optimization problem in which power and spectrum resources are allocated in the radio part. In the core part, the chaining, placement, and scheduling of functions are performed to ensure the QoS of all users. This joint optimization problem is modeled as a Markov decision process (MDP), considering time-varying characteristics of the available resources and wireless channels. A soft actor-critic deep reinforcement learning (SAC-DRL) algorithm based on the maximum entropy framework is subsequently utilized to solve the above MDP. Numerical results reveal that the proposed joint approach based on the SAC-DRL algorithm could significantly reduce energy consumption compared to the case in which R-RA and NFV-RA problems are optimized separately.
In this paper, the security-aware robust resource allocation in energy harvesting cognitive radio networks is considered with cooperation between two transmitters while there are uncertainties in channel gains and battery energy value. To be specific, the primary access point harvests energy from the green resource and uses time switching protocol to send the energy and data towards the secondary access point (SAP). Using power-domain non-orthogonal multiple access technique, the SAP helps the primary network to improve the security of data transmission by using the frequency band of the primary network. In this regard, we introduce the problem of maximizing the proportional-fair energy efficiency (PFEE) considering uncertainty in the channel gains and battery energy value subject to the practical constraints. Moreover, the channel gain of the eavesdropper is assumed to be unknown. Employing the decentralized partially observable Markov decision process, we investigate the solution of the corresponding resource allocation problem. We exploit multi-agent with single reward deep deterministic policy gradient (MASRDDPG) and recurrent deterministic policy gradient (RDPG) methods. These methods are compared with the state-of-the-art ones like multi-agent and single-agent DDPG. Simulation results show that both MASRDDPG and RDPG methods, outperform the state-of-the-art methods by providing more PFEE to the network.
This paper proposes a novel approach to improve the performance of a heterogeneous network (HetNet) supported by dual connectivity (DC) by adopting multiple unmanned aerial vehicles (UAVs) as passive relays that carry reconfigurable intelligent surfaces (RISs). More specifically, RISs are deployed under the UAVs termed as UAVs-RISs that operate over the micro-wave ($\mu$W) channel in the sky to sustain a strong line-of-sight (LoS) connection with the ground users. The macro-cell operates over the $\mu$W channel based on orthogonal multiple access (OMA), while small base stations (SBSs) operate over the millimeter-wave (mmW) channel based on non-orthogonal multiple access (NOMA). We study the problem of total transmit power minimization by jointly optimizing the trajectory/velocity of each UAV, RISs' phase shifts, subcarrier allocations, and active beamformers at each BS. The underlying problem is highly non-convex and the global optimal solution is intractable. To handle it, we decompose the original problem into two subproblems, i.e., a subproblem which deals with the UAVs' trajectories/velocities, RISs' phase shifts, and subcarrier allocations for $\mu$W; and a subproblem for active beamforming design and subcarrier allocation for mmW. In particular, we solve the first subproblem via the dueling deep Q-Network (DQN) learning approach by developing a distributed algorithm which leads to a better policy evaluation. Then, we solve the active beamforming design and subcarrier allocation for the mmW via the successive convex approximation (SCA) method. Simulation results exhibit the effectiveness of the proposed resource allocation scheme compared to other baseline schemes. In particular, it is revealed that by deploying UAVs-RISs, the transmit power can be reduced by 6 dBm while maintaining similar guaranteed QoS.
In this paper, we propose a novel joint intelligent trajectory design and resource allocation algorithm based on user's mobility and their requested services for unmanned aerial vehicles (UAVs) assisted networks, where UAVs act as nodes of a network function virtualization (NFV) enabled network. Our objective is to maximize energy efficiency and minimize the average delay on all services by allocating the limited radio and NFV resources. In addition, due to the traffic conditions and mobility of users, we let some Virtual Network Functions (VNFs) to migrate from their current locations to other locations to satisfy the Quality of Service requirements. We formulate our problem to find near-optimal locations of UAVs, transmit power, subcarrier assignment, placement, and scheduling the requested service's functions over the UAVs and perform suitable VNF migration. Then we propose a novel Hierarchical Hybrid Continuous and Discrete Action (HHCDA) deep reinforcement learning method to solve our problem. Finally, the convergence and computational complexity of the proposed algorithm and its performance analyzed for different parameters. Simulation results show that our proposed HHCDA method decreases the request reject rate and average delay by 31.5% and 20% and increases the energy efficiency by 40% compared to DDPG method.
In delay-sensitive industrial internet of things (IIoT) applications, the age of information (AoI) is employed to characterize the freshness of information. Meanwhile, the emerging network function virtualization provides flexibility and agility for service providers to deliver a given network service using a sequence of virtual network functions (VNFs). However, suitable VNF placement and scheduling in these schemes is NP-hard and finding a globally optimal solution by traditional approaches is complex. Recently, deep reinforcement learning (DRL) has appeared as a viable way to solve such problems. In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions and jointly minimize VNF cost and AoI in terms of network resources under end-to end Quality of Service constraints. To surmount the single-agent capacity limitation for learning, we then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other. Simulation results demonstrate that single-agent schemes significantly outperform the greedy algorithm in terms of average network cost and AoI. Moreover, multi-agent solution decreases the average cost by dividing the tasks between the agents. However, it needs more iterations to be learned due to the requirement on the agents collaboration.
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers while ensuring timely delivery of safety-critical messages to the Road-Side Unit (RSU). Due to the challenges of dynamic channel conditions, centralized resource management schemes that require global information are inefficient and lead to large signaling overheads. Hence, we exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy. Existing MARL algorithms consider a holistic reward function for the group's collective success, which often ends up with unsatisfactory results and cannot guarantee an optimal policy for each agent. Consequently, motivated by the existing literature in RL, we propose a novel MARL framework that trains two critics with the following goals: A global critic which estimates the global expected reward and motivates the agents toward a cooperating behavior and an exclusive local critic for each agent that estimates the local individual reward. Furthermore, based on the tasks each agent has to accomplish, the individual reward of each agent is decomposed into multiple sub-reward functions where task-wise value functions are learned separately. Numerical results indicate our proposed algorithm's effectiveness compared with the conventional RL methods applied in this area.
This paper studies a novel approach for successive interference cancellation (SIC) ordering and beamforming in a multiple antennas non-orthogonal multiple access (NOMA) network with multi-carrier multi-user setup. To this end, we formulate a joint beamforming design, subcarrier allocation, user association, and SIC ordering algorithm to maximize the worst-case energy efficiency (EE). The formulated problem is a non-convex mixed integer non-linear programming (MINLP) which is generally difficult to solve. To handle it, we first adopt the linearizion technique as well as relaxing the integer variables, and then we employ the Dinkelbach algorithm to convert it into a more mathematically tractable form. The adopted non-convex optimization problem is transformed into an equivalent rank-constrained semidefinite programming (SDP) and is solved by SDP relaxation and exploiting sequential fractional programming. Furthermore, to strike a balance between complexity and performance, a low complex approach based on alternative optimization is adopted. Numerical results unveil that the proposed SIC ordering method outperforms the conventional existing works addressed in the literature.