Sherman
Abstract:This paper presents a deep unfolding-supported coordinated multipoint beam pattern synthesis (DUCoMP-BPS) scheme to overcome the high complexity, poor adaptability, and limited scalability of traditional cell-free anti-jamming beamforming. In the proposed design, access points (APs) independently determine analog beamforming using local angle information, while the central processing unit (CPU) performs cooperative digital beamforming with only a single AP-CPU interaction, significantly reducing fronthaul overhead. To further improve efficiency, a deep unfolding strategy transforms the costly step size search in analog beamforming into a trainable parameter, where an offline-trained complex-valued neural network enables fast and adaptive online inference. Simulation results show that the complexity of DUCoMP-BPS scales linearly with the number of APs, reduces single-AP analog beamforming runtime by about 67% compared to conventional optimization, and achieves superior nulling performance over purely data-driven approaches. Hardware feasibility is validated on an Advanced RISC Machine-Field Programmable Gate Array (ARM-FPGA) heterogeneous platform, where algorithm-hardware co-verification and hardware-software decoupling enable efficient parallelism and low-latency execution. Finally, anechoic chamber measurements under practical hardware imperfections confirm robust beamforming performance, demonstrating the strong potential of DUCoMP-BPS for real-world deployment.
Abstract:Extremely large-scale multiple-input multiple-output (XL-MIMO) is a key enabling technology for sixth-generation (6G) communication systems. Nevertheless, the increase in array aperture and signal bandwidth brings new challenges to wideband channel estimation in XL-MIMO systems. Motivated by recent advances in deep generative modeling, we propose a diffusion model-based method for near-field wideband channel estimation in XL-MIMO systems. We first analyze the statistical correlation of wideband channel and show that near-field wideband channel exhibits both spatial non-stationarity and beam split effects. Based on these observations, the channel estimation problem is formulated as a Bayesian posterior inference task, in which a diffusion model is employed to learn the prior distribution of the channel. To further enhance the representation of complex spatial-frequency channel structures, we design a denoising network with a multi-scale attention mechanism. In particular, the network extracts multi-scale spatial-frequency features via parallel convolutional branches with different receptive fields, and combines feature attention and spatial attention modules to adaptively emphasize critical channel features. This design enables more accurate modeling of near-field wideband channel distributions and consequently improves channel estimation performance. Experimental results demonstrate that the proposed method exhibits superior robustness to existing baseline schemes for XL-MIMO wideband channel estimation under different experimental settings.
Abstract:Extremely large-scale multiple-input multiple-output (XL-MIMO) is a key enabler for sixth-generation (6G) communications. However, near-field channel estimation is particularly challenging due to spherical-wave propagation and spatial non-stationarity. To tackle this challenge, we propose a structured sparse Bayesian learning framework with adaptive dictionary updating for near-field non-stationary channel estimation. Specifically, the proposed method iteratively updates the distance parameters within an adaptive dictionary, thereby enhancing the representation capability without increasing the dictionary size. Moreover, we develop a hierarchical prior model that jointly captures polar-domain sparsity and structured dependency, enabling efficient Bayesian inference. Simulation results demonstrate that the proposed approach outperforms existing polar-domain dictionary-based methods while achieving low dictionary overhead.
Abstract:With 6G evolving towards intelligent network autonomy, artificial intelligence (AI)-native operations are becoming pivotal. Wireless networks continuously generate rich and heterogeneous data, which inherently exhibits spatio-temporal graph structure. However, limited radio resources result in incomplete and noisy network measurements. This challenge is further intensified when a target variable and its strongest correlates are missing over contiguous intervals, forming systemic blind spots. To tackle this issue, we propose RieIF (Knowledge-driven Riemannian Information Flow), a geometry-consistent framework that incorporates knowledge graphs (KGs) for robust spatio-temporal graph signal prediction. For analytical tractability within the Fisher-Rao geometry, we project the input from a Riemannian manifold onto a positive unit hypersphere, where angular similarity is computationally efficient. This projection is implemented via a graph transformer, using the KG as a structural prior to constrain attention and generate a micro stream. Simultaneously, a Long Short-Term Memory (LSTM) model captures temporal dynamics to produce a macro stream. Finally, the micro stream (highlighting geometric shape) and the macro stream (emphasizing signal strength) are adaptively fused through a geometric gating mechanism for signal recovery. Experiments on three wireless datasets show consistent improvements under systemic blind spots, including up to 31% reduction in root mean squared error and up to 3.2 dB gain in recovery signal-to-noise ratio, while maintaining robustness to graph sparsity and measurement noise.
Abstract:The deployment of extremely large-scale antenna array (ELAA) in sixth-generation (6G) communication systems introduces unique challenges for efficient near-field channel estimation. To tackle these issues, this paper presents a theory-guided approach that incorporates angular information into an attention-based estimation framework. A piecewise Fourier representation is proposed to implicitly encode the near-field channel's inherent nonlinearity, enabling the entire channel to be segmented into multiple subchannels, each mapped to the angular domain via the discrete Fourier transform (DFT). Then, we develop a joint subchannel-spatial-attention network (JSSAnet) to extract the spatial features of both intra- and inter-subchannels. To guide theoretically the design of the joint attention mechanism, we derive upper and lower bounds based on approximation criteria and DFT quantization loss mitigation, respectively. Following by both bounds, a JSSA layer of an attention block is constructed to assign independent and adaptive spatial attention weights to each subchannel in parallel. Subsequently, a feed-forward network (FFN) of an attention block further captures and refines the residual nonlinear dependencies across subchannels. Moreover, the proposed JSSA map is linearly computed via element-wise product combining large-kernel convolutions (DLKC), maintaining strong contextual learning capability. Numerical results verify the effectiveness of embedding sparsity information into the attention network and demonstrate JSSAnet achieves superior estimation performance compared with existing methods.
Abstract:Vision-language navigation (VLN) requires intelligent agents to navigate environments by interpreting linguistic instructions alongside visual observations, serving as a cornerstone task in Embodied AI. Current VLN research for unmanned aerial vehicles (UAVs) relies on detailed, pre-specified instructions to guide the UAV along predetermined routes. However, real-world outdoor exploration typically occurs in unknown environments where detailed navigation instructions are unavailable. Instead, only coarse-grained positional or directional guidance can be provided, requiring UAVs to autonomously navigate through continuous planning and obstacle avoidance. To bridge this gap, we propose AutoFly, an end-to-end Vision-Language-Action (VLA) model for autonomous UAV navigation. AutoFly incorporates a pseudo-depth encoder that derives depth-aware features from RGB inputs to enhance spatial reasoning, coupled with a progressive two-stage training strategy that effectively aligns visual, depth, and linguistic representations with action policies. Moreover, existing VLN datasets have fundamental limitations for real-world autonomous navigation, stemming from their heavy reliance on explicit instruction-following over autonomous decision-making and insufficient real-world data. To address these issues, we construct a novel autonomous navigation dataset that shifts the paradigm from instruction-following to autonomous behavior modeling through: (1) trajectory collection emphasizing continuous obstacle avoidance, autonomous planning, and recognition workflows; (2) comprehensive real-world data integration. Experimental results demonstrate that AutoFly achieves a 3.9% higher success rate compared to state-of-the-art VLA baselines, with consistent performance across simulated and real environments.
Abstract:This paper proposes a user-centric split federated learning (UCSFL) framework for user-centric cell-free multiple-input multiple-output (CF-MIMO) networks to support split federated learning (SFL). In the proposed UCSFL framework, users deploy split sub-models locally, while complete models are maintained and updated at access point (AP)-side distributed processing units (DPUs), followed by a two-level aggregation procedure across DPUs and the central processing unit (CPU). Under standard machine learning (ML) assumptions, we provide a theoretical convergence analysis for UCSFL, which reveals that the AP-cluster size is a key factor influencing model training accuracy. Motivated by this result, we introduce a new performance metric, termed the latency-to-accuracy ratio, defined as the ratio of a user's per-iteration training latency to the weighted size of its AP cluster. Based on this metric, we formulate a joint optimization problem to minimize the maximum latency-to-accuracy ratio by jointly optimizing uplink power control, downlink beamforming, model splitting, and AP clustering. The resulting problem is decomposed into two sub-problems operating on different time scales, for which dedicated algorithms are developed to handle the short-term and long-term optimizations, respectively. Simulation results verify the convergence of the proposed algorithms and demonstrate that UCSFL effectively reduces the latency-to-accuracy ratio of the VGG16 model compared with baseline schemes. Moreover, the proposed framework adaptively adjusts splitting and clustering strategies in response to varying communication and computation resources. An MNIST-based handwritten digit classification example further shows that UCSFL significantly accelerates the convergence of the VGG16 model.
Abstract:Phase synchronization among distributed transmission reception points (TRPs) is a prerequisite for enabling coherent joint transmission and high-precision sensing in millimeter wave (mmWave) cell-free massive multiple-input and multiple-output (MIMO) systems. This paper proposes a bidirectional calibration scheme and a calibration coefficient estimation method for phase synchronization, and presents a calibration coefficient phase tracking method using unilateral uplink/downlink channel state information (CSI). Furthermore, this paper introduces the use of reciprocity calibration to eliminate non-ideal factors in sensing and leverages sensing results to achieve calibration coefficient phase tracking in dynamic scenarios, thus enabling bidirectional empowerment of both communication and sensing. Simulation results demonstrate that the proposed method can effectively implement reciprocal calibration with lower overhead, enabling coherent collaborative transmission, and resolving non-ideal factors to acquire lower sensing error in sensing applications. Experimental results show that, in the mmWave band, over-the-air (OTA) bidirectional calibration enables coherent collaborative transmission for both collaborative TRPs and collaborative user equipments (UEs), achieving beamforming gain and long-time coherent sensing capabilities.
Abstract:The move to next-generation wireless communications with extremely large-scale antenna arrays (ELAAs) brings the communications into the radiative near-field (RNF) region, where distance-aware focusing is feasible. However, high-frequency RNF links are highly vulnerable to blockage in indoor environments dominated by half-space obstacles (walls, corners) that create knife-edge shadows. Conventional near-field focused beams offer high gain in line-of-sight (LoS) scenarios but suffer from severe energy truncation and effective-rank collapse in shadowed regions, often necessitating the deployment of auxiliary hardware such as Reconfigurable Intelligent Surfaces (RIS) to restore connectivity. We propose a beamforming strategy that exploits the auto-bending property of Airy beams to mitigate half-space blockage without additional hardware. The Airy beam is designed to ``ride'' the diffraction edge, accelerating its main lobe into the shadow to restore connectivity. Our contributions are threefold: (i) a Green's function-based RNF multi-user channel model that analytically reveals singular-value collapse behind knife-edge obstacles; (ii) an Airy analog beamforming scheme that optimizes the bending trajectory to recover the effective channel rank; and (iii) an Airy null-steering method that aligns oscillatory nulls with bright-region users to suppress interference in mixed shadow/bright scenarios. Simulations show that the proposed edge-riding Airy strategy achieves a Signal-to-Noise Ratio (SNR) improvement of over 20 dB and restores full-rank connectivity in shadowed links compared to conventional RNF focusing, virtually eliminating outage in geometric shadows and increasing multi-user spectral efficiency by approximately 35\% under typical indoor ELAA configurations. These results demonstrate robust RNF multi-user access in half-space blockage scenarios without relying on RIS.
Abstract:Low-altitude wireless networks (LAWNs) are expected to play a central role in future 6G infrastructures, yet uplink transmissions of uncrewed aerial vehicles (UAVs) remain vulnerable to eavesdropping due to their limited transmit power, constrained antenna resources, and highly exposed air-ground propagation conditions. To address this fundamental bottleneck, we propose a flexible-duplex cell-free (CF) architecture in which each distributed access point (AP) can dynamically operate either as a receive AP for UAV uplink collection or as a transmit AP that generates cooperative artificial noise (AN) for secrecy enhancement. Such AP-level duplex flexibility introduces an additional spatial degree of freedom that enables distributed and adaptive protection against wiretapping in LAWNs. Building upon this architecture, we formulate a max-min secrecy-rate problem that jointly optimizes AP mode selection, receive combining, and AN covariance design. This tightly coupled and nonconvex optimization is tackled by first deriving the optimal receive combiners in closed form, followed by developing a penalty dual decomposition (PDD) algorithm with guaranteed convergence to a stationary solution. To further reduce computational burden, we propose a low-complexity sequential scheme that determines AP modes via a heuristic metric and then updates the AN covariance matrices through closed-form iterations embedded in the PDD framework. Simulation results show that the proposed flexible-duplex architecture yields substantial secrecy-rate gains over CF systems with fixed AP roles. The joint optimization method attains the highest secrecy performance, while the low-complexity approach achieves over 90% of the optimal performance with an order-of-magnitude lower computational complexity, offering a practical solution for secure uplink communications in LAWNs.