Generally, Reinforcement Learning (RL) agent updates its policy by repetitively interacting with the environment, contingent on the received rewards to observed states and undertaken actions. However, the environmental disturbance, commonly leading to noisy observations (e.g., rewards and states), could significantly shape the performance of agent. Furthermore, the learning performance of Multi-Agent Reinforcement Learning (MARL) is more susceptible to noise due to the interference among intelligent agents. Therefore, it becomes imperative to revolutionize the design of MARL, so as to capably ameliorate the annoying impact of noisy rewards. In this paper, we propose a novel decomposition-based multi-agent distributional RL method by approximating the globally shared noisy reward by a Gaussian mixture model (GMM) and decomposing it into the combination of individual distributional local rewards, with which each agent can be updated locally through distributional RL. Moreover, a diffusion model (DM) is leveraged for reward generation in order to mitigate the issue of costly interaction expenditure for learning distributions. Furthermore, the optimality of the distribution decomposition is theoretically validated, while the design of loss function is carefully calibrated to avoid the decomposition ambiguity. We also verify the effectiveness of the proposed method through extensive simulation experiments with noisy rewards. Besides, different risk-sensitive policies are evaluated in order to demonstrate the superiority of distributional RL in different MARL tasks.
Reconfigurable intelligent surface (RIS) is an attractive technology to improve the transmission rate of millimetre-wave (mmWave) communication systems. The previous {research} on RIS technology mainly focused on improving the transmission rate and security rate of the mmWave communication systems. Since the emergence of RIS technology creates the conditions for generating an intelligent radio environment, it also has potential advantages on improving the localization accuracy of the mmWave communication systems. Deployed on walls and objects, RISs are capable of significantly improving communications and positioning coverage by controlling the multi-path reflection. This paper considers the RIS-aided mmWave localization system and proposes a joint beamforming and localization problem. However, since the objective function depends on the unknown UE's position and instantaneous channel state information (CSI), this beamforming and localization technology based on RIS assistance is challenging. To solve this problem, we propose a new joint localization and beamforming optimization (JLBO) algorithm, and give the proof of its convergence. The simulation results show that the RIS can improve the user localization accuracy of the system and the proposed scheme has a significant performance improvement compared with the traditional schemes.
Since the orthogonality of the line-of-sight multiple input multiple output (LoS MIMO) channel is only available within the Rayleigh distance, coverage of communication systems is restricted due to the finite implementation spacing of antennas. However, media with different permittivity in the transmission path are likely to loosen the requirement for antenna spacing. Such a conclusion could be enlightening in an air-to-ground LoS MIMO scenario considering the existence of clouds in the troposphere. To analyze the random phase variations in the presence of a single-layer cloud, we propose and modify a new cloud modeling method fit for LoS MIMO scene based on real-measurement data. Then, the preliminary analysis of channel capacity is conducted based on the simulation result.
Single-pixel imaging (SPI) has attracted widespread attention because it generally uses a non-pixelated photodetector and a digital micromirror device (DMD) to acquire the object image. Since the modulated patterns seen from two reflection directions of the DMD are naturally complementary, one can apply complementary balanced measurements to greatly improve the measurement signal-to-noise ratio and reconstruction quality. However, the balance between two reflection arms significantly determines the quality of differential measurements. In this work, we propose and demonstrate a simple secondary complementary balancing mechanism to minimize the impact of the imbalance on the imaging system. In our SPI setup, we used a silicon free-space balanced amplified photodetector with 5 mm active diameter which could directly output the difference between two optical input signals in two reflection arms. Both simulation and experimental results have demonstrated that the use of secondary complementary balancing can result in a better cancellation of direct current components of measurements and a better image restoration quality.
Single-pixel imaging (SPI) is very popular in subsampling applications, but the random measurement matrices it typically uses will lead to measurement blindness as well as difficulties in calculation and storage, and will also limit the further reduction in sampling rate. The deterministic Hadamard basis has become an alternative choice due to its orthogonality and structural characteristics. There is evidence that sorting the Hadamard basis is beneficial to further reduce the sampling rate, thus many orderings have emerged, but their relations remain unclear and lack a unified theory. Given this, here we specially propose a concept named selection history, which can record the Hadamard spatial folding process, and build a model based on it to reveal the formation mechanisms of different orderings and to deduce the mutual conversion relationship among them. Then, a weight ordering of the Hadamard basis is proposed. Both numerical simulation and experimental results have demonstrated that with this weight sort technique, the sampling rate, reconstruction time and matrix memory consumption are greatly reduced in comparison to traditional sorting methods. Therefore, we believe that this method may pave the way for real-time single-pixel imaging.
Designing optimal reward functions has been desired but extremely difficult in reinforcement learning (RL). When it comes to modern complex tasks, sophisticated reward functions are widely used to simplify policy learning yet even a tiny adjustment on them is expensive to evaluate due to the drastically increasing cost of training. To this end, we propose a hindsight reward tweaking approach by designing a novel paradigm for deep reinforcement learning to model the influences of reward functions within a near-optimal space. We simply extend the input observation with a condition vector linearly correlated with the effective environment reward parameters and train the model in a conventional manner except for randomizing reward configurations, obtaining a hyper-policy whose characteristics are sensitively regulated over the condition space. We demonstrate the feasibility of this approach and study one of its potential application in policy performance boosting with multiple MuJoCo tasks.
In this paper, we focus on the sum-rate optimization in a multi-user millimeter-wave (mmWave) system with distributed intelligent reflecting surfaces (D-IRSs), where a base station (BS) communicates with users via multiple IRSs. The BS transmit beamforming, IRS switch vector, and phase shifts of the IRS are jointly optimized to maximize the sum-rate under minimum user rate, unit-modulus, and transmit power constraints. To solve the resulting non-convex optimization problem, we develop an efficient alternating optimization (AO) algorithm. Specifically, the non-convex problem is converted into three subproblems, which are solved alternatively. The solution to transmit beamforming at the BS and the phase shifts at the IRS are derived by using the successive convex approximation (SCA)-based algorithm, and a greedy algorithm is proposed to design the IRS switch vector. The complexity of the proposed AO algorithm is analyzed theoretically. Numerical results show that the D-IRSs-aided scheme can significantly improve the sum-rate and energy efficiency performance.