Shitz




Abstract:Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and in-phase and quadrature imbalance. To overcome these challenges, this paper proposes an efficient downlink channel estimation (CE) and CSI feedback approach based on knowledge and data dual-driven deep learning (DL) networks. Specifically, we first propose a data-driven residual neural network de-quantizer (ResNet-DQ) to pre-process the received pilot signals at user equipment (UEs), where the noise and distortion brought by imperfect hardware can be mitigated. A knowledge-driven generalized multiple measurement vector learned approximate message passing (GMMV-LAMP) network is then developed to jointly estimate the channels by exploiting the approximately same physical angle shared by different subcarriers. In particular, two wideband redundant dictionaries (WRDs) are proposed such that the measurement matrices of the GMMV-LAMP network can accommodate the far-field and near-field beam squint effect, respectively. Finally, we propose an encoder at the UEs and a decoder at the AP by a data-driven CSI residual network (CSI-ResNet) to compress the CSI matrix into a low-dimensional quantized bit vector for feedback, thereby reducing the feedback overhead substantially. Simulation results show that the proposed knowledge and data dual-driven approach outperforms conventional downlink CE and CSI feedback methods, especially in the case of low signal-to-noise ratios.




Abstract:We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms, encompassing the effects of channel fading and interference, for a broad spectrum of nonconvex loss functions. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of $\mathcal{O}( \ln{(T)} /{ T^{ 1 - \frac{1}{\alpha} } } )$, where $\alpha$ represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the $\mathcal{O}( 1/T )$ rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.
Abstract:In this paper, the problem of joint transmission and computation resource allocation for a multi-user probabilistic semantic communication (PSC) network is investigated. In the considered model, users employ semantic information extraction techniques to compress their large-sized data before transmitting them to a multi-antenna base station (BS). Our model represents large-sized data through substantial knowledge graphs, utilizing shared probability graphs between the users and the BS for efficient semantic compression. The resource allocation problem is formulated as an optimization problem with the objective of maximizing the sum of equivalent rate of all users, considering total power budget and semantic resource limit constraints. The computation load considered in the PSC network is formulated as a non-smooth piecewise function with respect to the semantic compression ratio. To tackle this non-convex non-smooth optimization challenge, a three-stage algorithm is proposed where the solutions for the receive beamforming matrix of the BS, transmit power of each user, and semantic compression ratio of each user are obtained stage by stage. Numerical results validate the effectiveness of our proposed scheme.


Abstract:Motivated by applications in large-scale and multi-agent reinforcement learning, we study the non-asymptotic performance of stochastic approximation (SA) schemes with delayed updates under Markovian sampling. While the effect of delays has been extensively studied for optimization, the manner in which they interact with the underlying Markov process to shape the finite-time performance of SA remains poorly understood. In this context, our first main contribution is to show that under time-varying bounded delays, the delayed SA update rule guarantees exponentially fast convergence of the \emph{last iterate} to a ball around the SA operator's fixed point. Notably, our bound is \emph{tight} in its dependence on both the maximum delay $\tau_{max}$, and the mixing time $\tau_{mix}$. To achieve this tight bound, we develop a novel inductive proof technique that, unlike various existing delayed-optimization analyses, relies on establishing uniform boundedness of the iterates. As such, our proof may be of independent interest. Next, to mitigate the impact of the maximum delay on the convergence rate, we provide the first finite-time analysis of a delay-adaptive SA scheme under Markovian sampling. In particular, we show that the exponent of convergence of this scheme gets scaled down by $\tau_{avg}$, as opposed to $\tau_{max}$ for the vanilla delayed SA rule; here, $\tau_{avg}$ denotes the average delay across all iterations. Moreover, the adaptive scheme requires no prior knowledge of the delay sequence for step-size tuning. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.




Abstract:Sixth-generation (6G) wireless communication systems, as stated in the European 6G flagship project Hexa-X, are anticipated to feature the integration of intelligence, communication, sensing, positioning, and computation. An important aspect of this integration is integrated sensing and communication (ISAC), in which the same waveform is used for both systems both sensing and communication, to address the challenge of spectrum scarcity. Recently, the orthogonal time frequency space (OTFS) waveform has been proposed to address OFDM's limitations due to the high Doppler spread in some future wireless communication systems. In this paper, we review existing OTFS waveforms for ISAC systems and provide some insights into future research. Firstly, we introduce the basic principles and a system model of OTFS and provide a foundational understanding of this innovative technology's core concepts and architecture. Subsequently, we present an overview of OTFS-based ISAC system frameworks. We provide a comprehensive review of recent research developments and the current state of the art in the field of OTFS-assisted ISAC systems to gain a thorough understanding of the current landscape and advancements. Furthermore, we perform a thorough comparison between OTFS-enabled ISAC operations and traditional OFDM, highlighting the distinctive advantages of OTFS, especially in high Doppler spread scenarios. Subsequently, we address the primary challenges facing OTFS-based ISAC systems, identifying potential limitations and drawbacks. Then, finally, we suggest future research directions, aiming to inspire further innovation in the 6G wireless communication landscape.


Abstract:In this paper, we quantitatively compare these two effective communication schemes, i.e., digital and analog ones, for wireless federated learning (FL) over resource-constrained networks, highlighting their essential differences as well as their respective application scenarios. We first examine both digital and analog transmission methods, together with a unified and fair comparison scheme under practical constraints. A universal convergence analysis under various imperfections is established for FL performance evaluation in wireless networks. These analytical results reveal that the fundamental difference between the two paradigms lies in whether communication and computation are jointly designed or not. The digital schemes decouple the communication design from specific FL tasks, making it difficult to support simultaneous uplink transmission of massive devices with limited bandwidth. In contrast, the analog communication allows over-the-air computation (AirComp), thus achieving efficient spectrum utilization. However, computation-oriented analog transmission reduces power efficiency, and its performance is sensitive to computational errors. Finally, numerical simulations are conducted to verify these theoretical observations.
Abstract:Stacked intelligent metasurfaces (SIM) are capable of emulating reconfigurable physical neural networks by relying on electromagnetic (EM) waves as carriers. They can also perform various complex computational and signal processing tasks. A SIM is fabricated by densely integrating multiple metasurface layers, each consisting of a large number of small meta-atoms that can control the EM waves passing through it. In this paper, we harness a SIM for two-dimensional (2D) direction-of-arrival (DOA) estimation. In contrast to the conventional designs, an advanced SIM in front of the receiver array automatically carries out the 2D discrete Fourier transform (DFT) as the incident waves propagate through it. As a result, the receiver array directly observes the angular spectrum of the incoming signal. In this context, the DOA estimates can be readily obtained by using probes to detect the energy distribution on the receiver array. This avoids the need for power-thirsty radio frequency (RF) chains. To enable SIM to perform the 2D DFT, we formulate the optimization problem of minimizing the fitting error between the SIM's EM response and the 2D DFT matrix. Furthermore, a gradient descent algorithm is customized for iteratively updating the phase shift of each meta-atom in SIM. To further improve the DOA estimation accuracy, we configure the phase shift pattern in the zeroth layer of the SIM to generate a set of 2D DFT matrices associated with orthogonal spatial frequency bins. Additionally, we analytically evaluate the performance of the proposed SIM-based DOA estimator by deriving a tight upper bound for the mean square error (MSE). Our numerical simulations verify the capability of a well-trained SIM to perform DOA estimation and corroborate our theoretical analysis. It is demonstrated that a SIM having an optical computational speed achieves an MSE of $10^{-4}$ for DOA estimation.
Abstract:The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes. In line with this goal, the latest AI models are frequently trained in a decentralized manner. Distributed learning involves multiple entities working together to make collective predictions and decisions. However, this collaboration can also bring about security vulnerabilities and challenges. This paper provides an in-depth survey on private knowledge sharing in distributed learning, examining various knowledge components utilized in leading distributed learning architectures. Our analysis sheds light on the most critical vulnerabilities that may arise when using these components in a distributed setting. We further identify and examine defensive strategies for preserving the privacy of these knowledge components and preventing malicious parties from manipulating or accessing the knowledge information. Finally, we highlight several key limitations of knowledge sharing in distributed learning and explore potential avenues for future research.
Abstract:This work explores the fundamental problem of the recoverability of a sparse tensor being reconstructed from its compressed embodiment. We present a generalized model of block-sparse tensor recovery as a theoretical foundation, where concepts measuring holistic mutual incoherence property (MIP) of the measurement matrix set are defined. A representative algorithm based on the orthogonal matching pursuit (OMP) framework, called tensor generalized block OMP (T-GBOMP), is applied to the theoretical framework elaborated for analyzing both noiseless and noisy recovery conditions. Specifically, we present the exact recovery condition (ERC) and sufficient conditions for establishing it with consideration of different degrees of restriction. Reliable reconstruction conditions, in terms of the residual convergence, the estimated error and the signal-to-noise ratio bound, are established to reveal the computable theoretical interpretability based on the newly defined MIP, which we introduce. The flexibility of tensor recovery is highlighted, i.e., the reliable recovery can be guaranteed by optimizing MIP of the measurement matrix set. Analytical comparisons demonstrate that the theoretical results developed are tighter and less restrictive than the existing ones (if any). Further discussions provide tensor extensions for several classic greedy algorithms, indicating that the sophisticated results derived are universal and applicable to all these tensorized variants.
Abstract:The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented under mild conditions on $f$. Under such conditions, the optimal measure is shown to be unique. Examples of the solution for particular choices of the function $f$ are presented. Previously known solutions to common regularization choices are obtained by leveraging the flexibility of the family of $f$-divergences. These include the unique solutions to empirical risk minimization with relative entropy regularization (Type-I and Type-II). The analysis of the solution unveils the following properties of $f$-divergences when used in the ERM-$f$DR problem: $i\bigl)$ $f$-divergence regularization forces the support of the solution to coincide with the support of the reference measure, which introduces a strong inductive bias that dominates the evidence provided by the training data; and $ii\bigl)$ any $f$-divergence regularization is equivalent to a different $f$-divergence regularization with an appropriate transformation of the empirical risk function.