Abstract:Accurate channel state information (CSI) is vital for multiple-input multiple-output (MIMO) systems. However, superimposed pilots (SIP), which reduce overhead, introduce severe pilot contamination and data interference, complicating joint channel estimation and data detection. This paper proposes a conditional flow matching receiver (CFM-Rx), an unsupervised generative framework that learns directly from received signals, eliminating the need for labeled data and improving adaptability across diverse system settings. By leveraging flow-based generative modeling, CFM-Rx enables deterministic, low-latency inference and exploits model invertibility to capture the bidirectional nature of signal propagation. This framework unifies flow matching with score-based diffusion modeling via a moment-consistent ordinary differential equation (ODE), replacing stochastic differential equation (SDE) sampling with a deterministic and efficient process. Furthermore, it integrates receiver-side priors to ensure stable, data-consistent inference. Extensive simulation results across various MIMO configurations demonstrate that CFM-Rx consistently outperforms conventional estimators and state-of-the-art data-driven receivers, achieving notable gains in channel estimation accuracy and symbol detection robustness, particularly under severe pilot contamination.
Abstract:The expansion of the low-altitude economy is contingent on reliable cellular connectivity for unmanned aerial vehicles (UAVs). A key challenge in pre-flight planning is predicting communication link quality along proposed and pre-defined routes, a task hampered by sparse measurements that render existing radio map methods ineffective. This paper introduces a transfer learning framework for high-fidelity route-level radio map prediction. Our key insight is to leverage abundant crowdsourced ground signals as auxiliary supervision. To bridge the significant domain gap between ground and aerial data and address spatial sparsity, our framework learns general propagation priors from simulation, performs adversarial alignment of the feature spaces, and is fine-tuned on limited real UAV measurements. Extensive experiments on a real-world dataset from Meituan show that our method achieves over 50% higher accuracy in predicting Route RSRP compared to state-of-the-art baselines.
Abstract:Accurate precoding in massive multiple-input multiple-output (MIMO) frequency-division duplexing (FDD) systems relies on efficient channel state information (CSI) acquisition. End-to-end learning frameworks improve performance by jointly optimizing this process, but they lack scalability and fail to generalize across different system configurations, such as varying numbers of antennas and users. To overcome this limitation, we introduce WiFo-E, a wireless foundation model designed for scalable end-to-end precoding. WiFo-E employs multi-task pretraining on a diverse set of configurations to learn transferable representations of underlying wireless principles. Central to the model is a sparse Mixture-of-Experts (MoE) Transformer architecture, which mitigates task interference and enhances training efficiency by activating specialized parameter subsets adaptively. Extensive simulations demonstrate that WiFo-E outperforms conventional per-configuration training and shows strong generalization to unseen system configurations, providing a flexible and efficient foundation for adaptive massive MIMO precoding.
Abstract:The growing adoption of sensor-rich intelligent systems has boosted the use of multi-modal sensing to improve wireless communications. However, traditional methods require extensive manual design of data preprocessing, network architecture, and task-specific fine-tuning, which limits both development scalability and real-world deployment. To address this, we propose WiFo-M$^2$, a foundation model that can be easily plugged into existing deep learning-based transceivers for universal performance gains. To extract generalizable out-of-band (OOB) channel features from multi-modal sensing, we introduce ContraSoM, a contrastive pre-training strategy. Once pre-trained, WiFo-M$^2$ infers future OOB channel features from historical sensor data and strengthens feature robustness via modality-specific data augmentation. Experiments show that WiFo-M$^2$ improves performance across multiple transceiver designs and demonstrates strong generalization to unseen scenarios.
Abstract:Multi-user signal demodulation is critical to wireless communications, directly impacting transmission reliability and efficiency. However, existing demodulators underperform in generic multi-user environments: classical demodulators struggle to balance accuracy and complexity, while deep learning-based methods lack adaptability under heterogeneous configurations. Although diffusion models have been introduced for demodulation, their flexibility remains limited for practical use. To address these issues, this work proposes WiFo-MUD, a universal diffusion-based foundation model for multi-user demodulation. The model aligns inter-user signal-to-noise ratio imbalance and performs conditional denoising via a customized backbone. Furthermore, a communication-aware consistency distillation method and a dynamic user-grouping strategy are devised to enhance inference. WiFo-MUD achieves state-of-the-art results on large-scale heterogeneous datasets, demonstrating efficient inference and strong generalization across varying system configurations.
Abstract:In intelligent low-altitude networks, integrating monitoring tasks into communication unmanned aerial vehicles (UAVs) can consume resources and increase handoff latency for communication links. To address this challenge, we propose a strategy that enables a "double use" of UAVs, unifying the monitoring and relay handoff functions into a single, efficient process. Our scheme, guided by an integrated sensing and communication framework, coordinates these multi-role UAVs through a proactive handoff network that fuses multi-view sensory data from aerial and ground vehicles. A lightweight vehicle inspection module and a two-stage training procedure are developed to ensure monitoring accuracy and collaborative efficiency. Simulation results demonstrate the effectiveness of this integrated approach: it reduces communication outage probability by nearly 10% at a 200 Mbps requirement without compromising monitoring performance and maintains high resilience (86% achievable rate) even in the absence of multiple UAVs, outperforming traditional ground-based handoff schemes. Our code is available at the https://github.com/Jiahui-L/UAP.
Abstract:The massive scale of Wireless Foundation Models (FMs) hinders their real-time deployment on edge devices. This letter moves beyond standard knowledge distillation by introducing a novel Multi-Component Adaptive Knowledge Distillation (MCAKD) framework. Key innovations include a Cross-Attention-Based Knowledge Selection (CA-KS) module that selectively identifies critical features from the teacher model, and an Autonomous Learning-Passive Learning (AL-PL) strategy that balances knowledge transfer with independent learning to achieve high training efficiency at a manageable computational cost. When applied to the WiFo FM, the distilled Tiny-WiFo model, with only 5.5M parameters, achieves a 1.6 ms inference time on edge hardware while retaining over 98% of WiFo's performance and its crucial zero-shot generalization capability, making real-time FM deployment viable.
Abstract:Semantic communication (SemComm) has emerged as a new communication paradigm. To enhance efficiency, multiple-input-multiple-output (MIMO) technology has been further integrated into SemComm systems. However, existing MIMO SemComm systems assume perfect channel matrix estimation for channel-adaptive joint source-channel coding, which is impractical due to hardware and pilot overhead constraints. In this paper, we propose a semantic image transmission system with channel matrix and channel noise adaptation, named HANA-JSCC, to cope with channel estimation errors in MIMO systems. We propose a channel matrix adaptor that collaborates with the channel codec to adapt to misaligned channel state information, thereby mitigating the impact of estimation errors. Since the relationship between the estimated channel matrix and true channel matrix is ill-posed (one-to-many), we further introduce a two-stage training strategy with knowledge distillation to overcome the convergence difficulties caused by the ill-posed problem. Comparing with the state-of-the-art benchmarks, HANA-JSCC achieves $0.40\sim0.54$dB higher average performance across various noise and estimation error levels in various datasets.
Abstract:Integrated sensing and communication (ISAC) within sub-THz frequencies is crucial for future air-ground networks, but unique propagation characteristics and hardware limitations present challenges in optimizing ISAC performance while increasing operational latency. This paper introduces a multi-modal sensing fusion framework inspired by synesthesia of machine (SoM) to enhance sub-THz ISAC transmission. By exploiting inherent degrees of freedom in sub-THz hardware and channels, the framework optimizes the radio-frequency environment. Squint-aware beam management is developed to improve air-ground network adaptability, enabling three-dimensional dynamic ISAC links. Leveraging multi-modal information, the framework enhances ISAC performance and reduces latency. Visual data rapidly localizes users and targets, while a customized multi-modal learning algorithm optimizes the hybrid precoder. A new metric provides comprehensive performance evaluation, and extensive experiments demonstrate that the proposed scheme significantly improves ISAC efficiency.
Abstract:This paper investigates a heterogeneous multi-vehicle, multi-modal sensing (H-MVMM) aided online precoding problem. The proposed H-MVMM scheme utilizes a vertical federated learning (VFL) framework to minimize pilot sequence length and optimize the sum rate. This offers a promising solution for reducing latency in frequency division duplexing systems. To achieve this, three preprocessing modules are designed to transform raw sensory data into informative representations relevant to precoding. The approach effectively addresses local data heterogeneity arising from diverse on-board sensor configurations through a well-structured VFL training procedure. Additionally, a label-free online model updating strategy is introduced, enabling the H-MVMM scheme to adapt its weights flexibly. This strategy features a pseudo downlink channel state information label simulator (PCSI-Simulator), which is trained using a semi-supervised learning (SSL) approach alongside an online loss function. Numerical results show that the proposed method can closely approximate the performance of traditional optimization techniques with perfect channel state information, achieving a significant 90.6\% reduction in pilot sequence length.