This paper presents a new deep reinforcement learning (DRL)-based approach to the trajectory planning and jamming rejection of an unmanned aerial vehicle (UAV) for the Internet-of-Things (IoT) applications. Jamming can prevent timely delivery of sensing data and reception of operation instructions. With the assistance of a reconfigurable intelligent surface (RIS), we propose to augment the radio environment, suppress jamming signals, and enhance the desired signals. The UAV is designed to learn its trajectory and the RIS configuration based solely on changes in its received data rate, using the latest deep deterministic policy gradient (DDPG) and twin delayed DDPG (TD3) models. Simulations show that the proposed DRL algorithms give the UAV with strong resistance against jamming and that the TD3 algorithm exhibits faster and smoother convergence than the DDPG algorithm, and suits better for larger RISs. This DRL-based approach eliminates the need for knowledge of the channels involving the RIS and jammer, thereby offering significant practical value.
Reconfigurable intelligent surfaces (RISs) can potentially combat jamming attacks by diffusing jamming signals. This paper jointly optimizes user selection, channel allocation, modulation-coding, and RIS configuration in a multiuser OFDMA system under a jamming attack. This problem is non-trivial and has never been addressed, because of its mixed-integer programming nature and difficulties in acquiring channel state information (CSI) involving the RIS and jammer. We propose a new deep reinforcement learning (DRL)-based approach, which learns only through changes in the received data rates of the users to reject the jamming signals and maximize the sum rate of the system. The key idea is that we decouple the discrete selection of users, channels, and modulation-coding from the continuous RIS configuration, hence facilitating the RIS configuration with the latest twin delayed deep deterministic policy gradient (TD3) model. Another important aspect is that we show a winner-takes-all strategy is almost surely optimal for selecting the users, channels, and modulation-coding, given a learned RIS configuration. Simulations show that the new approach converges fast to fulfill the benefit of the RIS, due to its substantially small state and action spaces. Without the need of the CSI, the approach is promising and offers practical value.
Autonomous tracking of suspicious unmanned aerial vehicles (UAVs) by legitimate monitoring UAVs (or monitors) can be crucial to public safety and security. It is non-trivial to optimize the trajectory of a monitor while conceiving its monitoring intention, due to typically non-convex propulsion and thrust power functions. This paper presents a novel framework to jointly optimize the propulsion and thrust powers, as well as the 3D trajectory of a solar-powered monitor which conducts covert, video-based, UAV-on-UAV tracking and surveillance. A multi-objective problem is formulated to minimize the energy consumption of the monitor and maximize a weighted sum of distance keeping and altitude changing, which measures the disguising of the monitor. Based on the practical power models of the UAV propulsion, thrust and hovering, and the model of the harvested solar power, the problem is non-convex and intangible for existing solvers. We convexify the propulsion power by variable substitution, and linearize the solar power. With successive convex approximation, the resultant problem is then transformed with tightened constraints and efficiently solved by the proximal difference-of-convex algorithm with extrapolation in polynomial time. The proposed scheme can be also applied online. Extensive simulations corroborate the merits of the scheme, as compared to baseline schemes with partial or no disguising.