In this paper, a novel framework for guaranteeing ultra-reliable millimeter wave (mmW) communications using multiple artificial intelligence (AI)-enabled reconfigurable intelligent surfaces (RISs) is proposed. The use of multiple AI-powered RISs allows changing the propagation direction of the signals transmitted from a mmW access point (AP) thereby improving coverage particularly for non-line-of-sight (NLoS) areas. However, due to the possibility of highly stochastic blockage over mmW links, designing an intelligent controller to jointly optimize the mmW AP beam and RIS phase shifts is a daunting task. In this regard, first, a parametric risk-sensitive episodic return is proposed to maximize the expected bit rate and mitigate the risk of mmW link blockage. Then, a closed-form approximation of the policy gradient of the risk-sensitive episodic return is analytically derived. Next, the problem of joint beamforming for mmW AP and phase shift control for mmW RISs is modeled as an identical payoff stochastic game within a cooperative multi-agent environment, in which the agents are the mmW AP and the RISs. Two centralized and distributed controllers are proposed to control the policies of the mmW AP and RISs. To directly find an optimal solution, the parametric functional-form policies for these controllers are modeled using deep recurrent neural networks (RNNs). Simulation results show that the error between policies of the optimal and the RNN-based controllers is less than 1.5%. Moreover, the variance of the achievable rates resulting from the deep RNN-based controllers is 60% less than the variance of the risk-averse baseline.
Conventional anti-jamming method mostly rely on frequency hopping to hide or escape from jammer. These approaches are not efficient in terms of bandwidth usage and can also result in a high probability of jamming. Different from existing works, in this paper, a novel anti-jamming strategy is proposed based on the idea of deceiving the jammer into attacking a victim channel while maintaining the communications of legitimate users in safe channels. Since the jammer's channel information is not known to the users, an optimal channel selection scheme and a sub optimal power allocation are proposed using reinforcement learning (RL). The performance of the proposed anti-jamming technique is evaluated by deriving the statistical lower bound of the total received power (TRP). Analytical results show that, for a given access point, over 50 % of the highest achievable TRP, i.e. in the absence of jammers, is achieved for the case of a single user and three frequency channels. Moreover, this value increases with the number of users and available channels. The obtained results are compared with two existing RL based anti-jamming techniques, and random channel allocation strategy without any jamming attacks. Simulation results show that the proposed anti-jamming method outperforms the compared RL based anti-jamming methods and random search method, and yields near optimal achievable TRP.
The effective deployment of connected vehicular networks is contingent upon maintaining a desired performance across spatial and temporal domains. In this paper, a graph-based framework, called SMART, is proposed to model and keep track of the spatial and temporal statistics of vehicle-to-infrastructure (V2I) communication latency across a large geographical area. SMART first formulates the spatio-temporal performance of a vehicular network as a graph in which each vertex corresponds to a subregion consisting of a set of neighboring location points with similar statistical features of V2I latency and each edge represents the spatio-correlation between latency statistics of two connected vertices. Motivated by the observation that the complete temporal and spatial latency performance of a vehicular network can be reconstructed from a limited number of vertices and edge relations, we develop a graph reconstruction-based approach using a graph convolutional network integrated with a deep Q-networks algorithm in order to capture the spatial and temporal statistic of feature map pf latency performance for a large-scale vehicular network. Extensive simulations have been conducted based on a five-month latency measurement study on a commercial LTE network. Our results show that the proposed method can significantly improve both the accuracy and efficiency for modeling and reconstructing the latency performance of large vehicular networks.
A new federated learning (FL) framework enabled by large-scale wireless connectivity is proposed for designing the autonomous controller of connected and autonomous vehicles (CAVs). In this framework, the learning models used by the controllers are collaboratively trained among a group of CAVs. To capture the varying CAV participation in the FL training process and the diverse local data quality among CAVs, a novel dynamic federated proximal (DFP) algorithm is proposed that accounts for the mobility of CAVs, the wireless fading channels, as well as the unbalanced and nonindependent and identically distributed data across CAVs. A rigorous convergence analysis is performed for the proposed algorithm to identify how fast the CAVs converge to using the optimal autonomous controller. In particular, the impacts of varying CAV participation in the FL process and diverse CAV data quality on the convergence of the proposed DFP algorithm are explicitly analyzed. Leveraging this analysis, an incentive mechanism based on contract theory is designed to improve the FL convergence speed. Simulation results using real vehicular data traces show that the proposed DFP-based controller can accurately track the target CAV speed over time and under different traffic scenarios. Moreover, the results show that the proposed DFP algorithm has a much faster convergence compared to popular FL algorithms such as federated averaging (FedAvg) and federated proximal (FedProx). The results also validate the feasibility of the contract-theoretic incentive mechanism and show that the proposed mechanism can improve the convergence speed of the DFP algorithm by 40% compared to the baselines.
In this paper, a novel framework is proposed to perform data-driven air-to-ground (A2G) channel estimation for millimeter wave (mmWave) communications in an unmanned aerial vehicle (UAV) wireless network. First, an effective channel estimation approach is developed to collect mmWave channel information, allowing each UAV to train a stand-alone channel model via a conditional generative adversarial network (CGAN) along each beamforming direction. Next, in order to expand the application scenarios of the trained channel model into a broader spatial-temporal domain, a cooperative framework, based on a distributed CGAN architecture, is developed, allowing each UAV to collaboratively learn the mmWave channel distribution in a fully-distributed manner. To guarantee an efficient learning process, necessary and sufficient conditions for the optimal UAV network topology that maximizes the learning rate for cooperative channel modeling are derived, and the optimal CGAN learning solution per UAV is subsequently characterized, based on the distributed network structure. Simulation results show that the proposed distributed CGAN approach is robust to the local training error at each UAV. Meanwhile, a larger airborne network size requires more communication resources per UAV to guarantee an efficient learning rate. The results also show that, compared with a stand-alone CGAN without information sharing and two other distributed schemes, namely: A multi-discriminator CGAN and a federated CGAN method, the proposed distributed CGAN approach yields a higher modeling accuracy while learning the environment, and it achieves a larger average data rate in the online performance of UAV downlink mmWave communications.
In this paper, the problem of enhancing the quality of virtual reality (VR) services is studied for an indoor terahertz (THz)/visible light communication (VLC) wireless network. In the studied model, small base stations (SBSs) transmit high-quality VR images to VR users over THz bands and light-emitting diodes (LEDs) provide accurate indoor positioning services for them using VLC. Here, VR users move in real time and their movement patterns change over time according to their applications. Both THz and VLC links can be blocked by the bodies of VR users. To control the energy consumption of the studied THz/VLC wireless VR network, VLC access points (VAPs) must be selectively turned on so as to ensure accurate and extensive positioning for VR users. Based on the user positions, each SBS must generate corresponding VR images and establish THz links without body blockage to transmit the VR content. The problem is formulated as an optimization problem whose goal is to maximize the average number of successfully served VR users by selecting the appropriate VAPs to be turned on and controlling the user association with SBSs. To solve this problem, a meta policy gradient (MPG) algorithm that enables the trained policy to quickly adapt to new user movement patterns is proposed. In order to solve the problem for VR scenarios with a large number of users, a dual method based MPG algorithm (D-MPG) with a low complexity is proposed. Simulation results demonstrate that, compared to a baseline trust region policy optimization algorithm (TRPO), the proposed MPG and D-MPG algorithms yield up to 38.2% and 33.8% improvement in the average number of successfully served users as well as 75% and 87.5% gains in the convergence speed, respectively.
Cooperative perception plays a vital role in extending a vehicle's sensing range beyond its line-of-sight. However, exchanging raw sensory data under limited communication resources is infeasible. Towards enabling an efficient cooperative perception, vehicles need to address the following fundamental question: What sensory data needs to be shared?, at which resolution?, and with which vehicles? To answer this question, in this paper, a novel framework is proposed to allow reinforcement learning (RL)-based vehicular association, resource block (RB) allocation, and content selection of cooperative perception messages (CPMs) by utilizing a quadtree-based point cloud compression mechanism. Furthermore, a federated RL approach is introduced in order to speed up the training process across vehicles. Simulation results show the ability of the RL agents to efficiently learn the vehicles' association, RB allocation, and message content selection while maximizing vehicles' satisfaction in terms of the received sensory information. The results also show that federated RL improves the training process, where better policies can be achieved within the same amount of time compared to the non-federated approach.
In this paper, the problem of the trajectory design for a group of energy-constrained drones operating in dynamic wireless network environments is studied. In the considered model, a team of drone base stations (DBSs) is dispatched to cooperatively serve clusters of ground users that have dynamic and unpredictable uplink access demands. In this scenario, the DBSs must cooperatively navigate in the considered area to maximize coverage of the dynamic requests of the ground users. This trajectory design problem is posed as an optimization framework whose goal is to find optimal trajectories that maximize the fraction of users served by all DBSs. To find an optimal solution for this non-convex optimization problem under unpredictable environments, a value decomposition based reinforcement learning (VDRL) solution coupled with a meta-training mechanism is proposed. This algorithm allows the DBSs to dynamically learn their trajectories while generalizing their learning to unseen environments. Analytical results show that, the proposed VD-RL algorithm is guaranteed to converge to a local optimal solution of the non-convex optimization problem. Simulation results show that, even without meta-training, the proposed VD-RL algorithm can achieve a 53.2% improvement of the service coverage and a 30.6% improvement in terms of the convergence speed, compared to baseline multi-agent algorithms. Meanwhile, the use of meta-learning improves the convergence speed of the VD-RL algorithm by up to 53.8% when the DBSs must deal with a previously unseen task.
In this paper, a novel communication framework that uses an unmanned aerial vehicle (UAV)-carried intelligent reflector (IR) is proposed to enhance multi-user downlink transmissions over millimeter wave (mmWave) frequencies. In order to maximize the downlink sum-rate, the optimal precoding matrix (at the base station) and reflection coefficient (at the IR) are jointly derived. Next, to address the uncertainty of mmWave channels and maintain line-of-sight links in a real-time manner, a distributional reinforcement learning approach, based on quantile regression optimization, is proposed to learn the propagation environment of mmWave communications, and, then, optimize the location of the UAV-IR so as to maximize the long-term downlink communication capacity. Simulation results show that the proposed learning-based deployment of the UAV-IR yields a significant advantage, compared to a non-learning UAV-IR, a static IR, and a direct transmission schemes, in terms of the average data rate and the achievable line-of-sight probability of downlink mmWave communications.
Edge intelligence, also called edge-native artificial intelligence (AI), is an emerging technological framework focusing on seamless integration of AI, communication networks, and mobile edge computing. It has been considered to be one of the key missing components in the existing 5G network and is widely recognized to be one of the most sought-after functions for tomorrow's wireless 6G cellular systems. In this article, we identify the key requirements and challenges of edge-native AI in 6G. A self-learning architecture based on self-supervised Generative Adversarial Nets (GANs) is introduced to \blu{demonstrate the potential performance improvement that can be achieved by automatic data learning and synthesizing at the edge of the network}. We evaluate the performance of our proposed self-learning architecture in a university campus shuttle system connected via a 5G network. Our result shows that the proposed architecture has the potential to identify and classify unknown services that emerge in edge computing networks. Future trends and key research problems for self-learning-enabled 6G edge intelligence are also discussed.